ADR-0249: Tiny-AI Wave 1 baseline C1 — fr_regressor_v1 on Netflix Public¶
- Status: Accepted
- Date: 2026-04-29
- Deciders: Lusoris, Claude (Anthropic)
- Tags: tiny-ai, training, onnx, netflix-public, c1, fork-local
Context¶
ADR-0168 shipped the C2 (nr_metric_v1) and C3 (learned_filter_v1) Wave-1 baselines but deferred C1 (fr_regressor_v1) because the Netflix Public Dataset was access-gated (Google Drive folder requiring a manual request to Netflix; "cannot be downloaded programmatically"). C1's defining target — match or beat vmaf_v0.6.1 PLCC on Netflix Public — would be incomparable on a substitute corpus, so the Wave-1 roadmap row stayed Deferred and the deferral was tracked in docs/state.md.
On 2026-04-27, lawrence dropped the dataset locally at .workingdir2/netflix/ (9 reference + 70 distorted YUVs at 1920×1080 yuv420p 8-bit, ~37 GB). The drop unblocks BACKLOG row T6-1a. The 21-feature parquet runs/full_features_netflix.parquet (11 040 rows × 25 cols) is already produced by ai/scripts/extract_full_features.py for prior research work (Research-0026 / 0027 / 0030). The vmaf_train.models.FRRegressor Lightning module (2-layer GELU MLP, hidden=64, dropout=0.1) was already shipped as Wave-1 scaffolding by ADR-0168. What was missing: a runnable C1 trainer that consumes the existing parquet, validates against vmaf_v0.6.1 reference scores, and emits the ONNX checkpoint + sidecar + registry row.
Decision¶
We will train and ship fr_regressor_v1 from the locally-available Netflix Public Dataset using the canonical-6 feature subset (adm2, vif_scale0–3, motion2 — the same input the production vmaf_v0.6.1 SVR consumes). The training script is ai/scripts/train_fr_regressor.py; held-out generalisation is reported as 9-fold leave-one-source-out (LOSO) mean PLCC against the vmaf_v0.6.1 per-frame teacher score, and the shipping checkpoint is re-trained on all 9 sources after the LOSO gate passes (mean PLCC ≥ 0.95). The vmaf_v0.6.1 teacher is DMOS-aligned by construction (Netflix's published SVR was trained against the Netflix-Public DMOS), so a high PLCC-vs-teacher transitively implies a high PLCC-vs-DMOS without re-fetching the (separately access-gated) DMOS sidecar CSV.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Use the local Netflix Public drop (chosen) | Same corpus and same per-frame vmaf_v0.6.1 teacher Netflix used; comparable to upstream baselines; no external dependency | Lawrence's drop is not redistributable (Netflix license); training is local-only by design | Best fit for C1's "match vmaf_v0.6.1 PLCC on Netflix Public" target; ADR-0242 already accepted local-only training corpora |
| Wait for Netflix-Public-via-pip / public mirror | Reproducible by anyone with pip install | None published; ADR-0168 audit confirmed no public mirror; indefinite wait | Ships the gun-without-bullets state indefinitely |
| Substitute KoNViD-1k | Already extracted (runs/full_features_konvid.parquet); CC BY 4.0 | C1's target metric is NFLX-Public PLCC against vmaf_v0.6.1 — KoNViD-1k uses different content + different MOS scale; result would be incomparable to the published target | Defeats the purpose of the C1 row |
| MLP-medium (hidden=32, depth=2, ~801 params) | Simpler graph | Phase-3 sweep already showed canonical-6 mean PLCC ≈ 0.984 with the FRRegressor recipe; no headroom | Sticking with the published Wave-1 architecture keeps the recipe identical to the spec'd FRRegressor |
| Optimise primary on SROCC (rank correlation) | Robust to monotone non-linear miscalibration | C1's stated target is "match vmaf_v0.6.1 PLCC"; SROCC is reported alongside but not gated | Aligns with the roadmap target verbatim |
Consequences¶
- Positive:
- Wave-1 ship list complete:
model/tiny/now carries C1 + C2 + C3. - The
--tiny-model fr_regressor_v1CLI path becomes a real metric, not a stub. - Future C1 retraining (different feature subsets, larger MLPs, QAT/PTQ) can branch off this trainer + parquet.
- Negative:
- Netflix Public Dataset is not redistributable. Reproducing the training run requires the local YUV drop; the parquet at
runs/full_features_netflix.parquetis gitignored. CI cannot retrain end-to-end — only smoke-test the pipeline (--epochs 3 --no-export). - Neutral / follow-ups:
- The synthetic
ai/scripts/build_bisect_cache.pyplaceholder (T6-1a sub-bullet) is not replaced in this PR; the bisect-cache fixture is byte-stable per ADR-0109, and switching it to the real Netflix-Public DMOS-aligned cache requires a separate PR with fresh seeds + an ADR-0109 amendment. Tracked separately. docs/state.mdflips the C1 deferral row from "Deferred" to "Closed (shipped 2026-04-29)" — done by sister agent in same wave.- Roadmap §2.1 row flips from Deferred to Shipped 2026-04-29.
References¶
- ADR-0168 — Wave-1 C2 + C3 shipped, C1 deferred.
- ADR-0242 — Netflix corpus loader.
- ADR-0203 — feature extractor + scores plumbing.
- docs/ai/roadmap.md §2.1 — Wave-1 ship-baselines table.
- BACKLOG row T6-1a (
.workingdir2/BACKLOG.md). - Source:
req— user direction this session: "unblock T6-1a, dataset is locally available at .workingdir2/netflix/."