🔬 Temporal State Prediction

A frozen V-JEPA-2 ViT-L encoder with a small attentive-pool head predicts per-cell cell-cycle state straight from a tracked clip, with no separate segment, track, and classify steps.

Finding: a frozen video foundation model holds its own against a purpose-built morphology and temporal baseline, and the binding constraint is data scale, not model capacity.

Select a single-cell clip below. The model classifies that one clip into its cell-cycle state.

selected clip

Predicted: interphase

Actual: interphase

✅ correct

▼ pick a clip (✅ correct · ❌ misclassified)

model	macro-F1	mitosis F1	mitosis event P/R (±3fr)
U-Net+BiLSTM baseline (3.8M)	0.629	0.599	0.65 / 0.62
frozen V-JEPA-2 head-only	0.6776	0.681	0.69 / 0.74

true ⧵ pred	interphase	pre-mitosis	mitosis	recall
interphase	4550	211	52	0.95
pre-mitosis	173	136	14	0.42
mitosis	44	7	125	0.71

Models: DnaRnaProteins/vjepa2-cell-cycle-vit-l, DnaRnaProteins/unet-bilstm-cell-cycle-baseline · Data: MICCAI Cell Tracking Challenge (Fluo-N2DL-HeLa). Labels derived from lineage trees (no manual annotation).

🔬 Temporal State Prediction

Predicted: interphase

Actual: interphase

Held-out HeLa (sequence 02, n=5312)