ADR-0640: Tiny-AI training on the original Netflix VMAF training corpus (2026-05-20 scaffold iteration)¶
- Status: Proposed
- Date: 2026-05-20
- Deciders: Lusoris / Claude (Anthropic)
- Tags:
ai,docs,workspace,mcp
Context¶
The fork's tiny-AI surface (ai/, core/src/dnn/) targets a lightweight full-reference (FR) regressor that can be distilled from the public vmaf_v0.6.1 SVM model or trained directly against subjective scores. The user holds the original Netflix VMAF training corpus locally at .workingdir2/netflix/ (gitignored, approximately 37 GB):
File names follow the Netflix encoding-ladder convention:
where <quality_label> = 0 conventionally identifies the lossless or near-lossless reference encode. The corpus was used to train the original vmaf_v0.6.1 SVM; reproducing that training pipeline on the fork enables direct comparisons between the classic SVM and candidate tiny-AI architectures, and provides a controlled distillation dataset.
Prior scaffold PRs (#153, #418, #759, #920, #1414) and ADRs (ADR-0242, ADR-0417, ADR-0612) established docs/ai/training-data.md, mcp-server/vmaf-mcp/tests/test_smoke_e2e.py, the --data-root loader API in ai/src/vmaf_train/data/datasets.py, and the companion research digests (0019, 0099, 0607). This ADR is the 2026-05-20 daily scaffold iteration. It updates the literature survey (Research Digest 0615), extends the architecture alternatives table with the feature-reweighting option added in the 2024–2026 IQA distillation literature, and cross-links the updated MCP smoke test documentation.
The unpublished-Netflix-models search (referenced in user memory) may surface additional training targets once resolved. This ADR treats that as an out-of-scope follow-up.
Decision¶
This is a scaffold-only PR. We commit:
- This ADR (ADR-0640) — updates the architecture alternatives table and reinforces the
--data-rootAPI contract so subsequent training PRs can cite a single current source. - Research Digest 0615 — a further update of the literature survey covering EfficientVMAF (CVPR 2024 workshop), IQA-PyTorch distillation pipelines, and ONNX Runtime 1.20 improvements published since Digest 0607 (2026-05-19).
- An updated cross-reference in
docs/ai/training-data.mdpointing to this ADR as the current scaffold iteration. - CHANGELOG fragment and rebase-notes entry per ADR-0108.
We do not run training, download data, or modify Netflix golden-data assertions in this PR. Architecture selection is deferred to the follow-up training PR, pending user confirmation.
Alternatives considered¶
A. Model architecture¶
| Option | Description | Pros | Cons | Status |
|---|---|---|---|---|
| A1 — MLP (2×64, ReLU) | Two-layer MLP on the 6-element VMAF feature vector | Tiny (≈8 KB ONNX), CPU-fast, matches vmaf_tiny_fr_v1 baseline | Limited capacity for temporal texture | Default candidate |
| A2 — MLP (3×128, ReLU) | Deeper MLP; same input | Higher capacity, still <50 KB | Needs regularisation to avoid overfit on 79-clip corpus | Runner-up |
| A3 — Attention-pooled FR | Per-frame features → self-attention → score | Handles temporal variation better | 10× larger; needs GPU inference for real-time | Deferred |
| A4 — Learned feature-reweighting | 12-param layer over VMAF sub-scores (Madhusudana et al. ICCV 2024) | Interpretable, near-identical compute to SVM | Research-grade; production readiness unproven | Under review |
B. Training target¶
| Option | Description | Pros | Cons | Status |
|---|---|---|---|---|
B1 — Distill from vmaf_v0.6.1 | Teacher is the public SVM; student minimises MSE on clip-mean scores | Reproducible without MOS data; aligns with published baseline | Does not exceed teacher ceiling | Recommended |
| B2 — Train from scratch on Netflix MOS | Use published per-clip quality labels from Netflix Blog posts | Could exceed teacher ceiling; directly models human preference | Netflix MOS labels are not fully public; corpus overlap uncertain | Possible follow-up |
| B3 — Hybrid (distill + MOS fine-tune) | Pre-train with B1, fine-tune on whatever MOS labels are available | Best of both if MOS labels exist | Two-stage pipeline; more fragile | Deferred |
C. Model size¶
| Option | Target size | Inference FPS (CPU, 1080p) | Status |
|---|---|---|---|
| C1 — "nano" < 10 KB ONNX | A1 | >200 FPS | Primary target |
| C2 — "tiny" < 100 KB ONNX | A2 or A4 | 60–200 FPS | Secondary target |
| C3 — "small" < 1 MB ONNX | A3 | <60 FPS | Out of scope for this workstream |
D. Evaluation scope¶
| Option | Evaluation dataset | Gate | Status |
|---|---|---|---|
| D1 — Netflix golden gate only | src01_hrc00/01_576x324.yuv (CPU reference, places=4) | Bit-exact CPU parity | Mandatory |
| D2 — Cross-backend deltas | All enabled backends via /cross-backend-diff | ULP ≤ 2 | Mandatory for GPU inference |
| D3 — VMAF v0.6.1 correlation | PLCC/SROCC ≥ 0.97 on Netflix corpus | Quality gate | Added in training PR |
| D4 — External corpus (BVI-DVC, LSVQ) | Existing fork ingestion scripts | Generalisation check | Optional follow-up |
Consequences¶
- Positive: Provides the current canonical ADR for training PRs to cite; refreshes the literature survey; ensures the MCP smoke test and the
--data-rootAPI contract remain documented alongside the training harness. - Negative: Does not train anything; the actual model improvement is deferred to a follow-up PR.
- Neutral / follow-ups:
- Architecture selection must be confirmed before the training PR opens.
- Training run is multi-day and requires GPU access and the local Netflix corpus at
.workingdir2/netflix/. - The
--data-rootflag is the mandatory CLI interface;VMAF_DATA_ROOTenv var is the fallback. Training scripts must not hard-code the corpus path.
References¶
- ADR-0612:
docs/adr/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md— previous (2026-05-19) scaffold iteration. - ADR-0242:
docs/adr/0242-tiny-ai-netflix-training-corpus.md— originating architecture and loader-API decision. - ADR-0042:
docs/adr/0042-tinyai-docs-required-per-pr.md— doc-substance rule for tiny-AI PRs. - ADR-0108:
docs/adr/0108-deep-dive-deliverables-rule.md— six deep-dive deliverables requirement. - Research Digest 0615:
docs/research/0615-tiny-ai-netflix-training-2026-05-20.md - Research Digest 0607:
docs/research/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md docs/ai/training-data.md— corpus path convention, loader API.- Source: user memory entry
project_netflix_training_corpus_local.md(paraphrase — the corpus lives at.workingdir2/netflix/; the Lusoris / Claude collaboration ADR documents the decision to scaffold before running). - Related PRs: #153, #418, #759, #920, #1414.