๐ฌ Temporal State Prediction
A frozen V-JEPA-2 ViT-L encoder with a small attentive-pool head predicts per-cell cell-cycle state straight from a tracked clip, with no separate segment, track, and classify steps.
Finding: a frozen video foundation model holds its own against a purpose-built morphology and temporal baseline, and the binding constraint is data scale, not model capacity.
Select a single-cell clip below. The model classifies that one clip into its cell-cycle state.
Held-out HeLa (sequence 02, n=5312)
| model | macro-F1 | mitosis F1 | mitosis event P/R (ยฑ3fr) |
|---|---|---|---|
| U-Net+BiLSTM baseline (3.8M) | 0.629 | 0.599 | 0.65 / 0.62 |
| frozen V-JEPA-2 head-only | 0.6776 | 0.681 | 0.69 / 0.74 |
Data scaling (GOWT1 to HeLa) lifts the same baseline by +0.186 macro-F1; a roughly 80x larger model adds only +0.046. Seed band: 0.635 ยฑ 0.098 (3 seeds). Single-seed gaps below 0.08 are not significant.
Confusion matrix (rows = true, cols = pred):
| true โงต pred | interphase | pre-mitosis | mitosis | recall |
|---|---|---|---|---|
| interphase | 4550 | 211 | 52 | 0.95 |
| pre-mitosis | 173 | 136 | 14 | 0.42 |
| mitosis | 44 | 7 | 125 | 0.71 |
Dominant error: pre-mitosis read as interphase. It is a soft, lineage-defined 8-frame window with no sharp morphological boundary.
Models: DnaRnaProteins/vjepa2-cell-cycle-vit-l, DnaRnaProteins/unet-bilstm-cell-cycle-baseline ยท Data: MICCAI Cell Tracking Challenge (Fluo-N2DL-HeLa). Labels derived from lineage trees (no manual annotation).