ADR-0244: vmaf_tiny_v2 — canonical-6 + StandardScaler tiny VMAF MLP¶
- Status: Accepted
- Date: 2026-04-29
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ai, dnn, tiny-ai, model, registry, fork-local
Context¶
The shipped vmaf_tiny_v1.onnx was trained on the Netflix corpus alone with a single train/val split (val=Tennis); it had no scaler stats baked in and no end-to-end provenance to the FULL-feature parquet that the Phase-3 research chain produced. The Phase-3 chain (Research-0027 → 0028 → 0029 → 0030) validated a concrete configuration on Netflix LOSO + KoNViD 5-fold:
- Architecture:
mlp_small(6 → 16 → 8 → 1, ~257 params). Phase-3d's arch sweep was inconclusive againstmlp_medium, so the small variant remains the v2 baseline. - Features:
canonical-6= (adm2,vif_scale0,vif_scale1,vif_scale2,vif_scale3,motion2). - Preprocessing: per-fold StandardScaler (fit on train, applied to val).
- Optimiser: Adam @
lr=1e-3, MSE loss, 90 epochs, batch_size 256. - Validated PLCC:
0.9978 ± 0.0021Netflix LOSO,0.9998KoNViD 5-fold; +0.005–0.018 PLCC over the prior Subset-B baseline.
The v2 model needs to ship these gains, and it needs to ship them in a way the runtime can consume without requiring the caller to know the scaler stats out-of-band.
Decision¶
We ship vmaf_tiny_v2.onnx with the validated Phase-3 configuration and bake the StandardScaler (mean, std) directly into the ONNX graph as Sub + Div Constant nodes that run before the MLP. The runtime feeds raw canonical-6 feature values; the trust-root sha256 covers the calibration values too. opset 17 (matches learned_filter_v1, nr_metric_v1, fastdvdnet_pre). For the production export we fit the scaler on the full 3-corpus parquet (Netflix + KoNViD + BVI-DVC D+C) — Phase-3 LOSO + 5-fold are the validation methodology; the shipped weights are trained on the union for maximum corpus coverage. Registered as vmaf_tiny_v2 (kind fr) in model/tiny/registry.json with quant_mode: fp32 and smoke: false.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Bundled scaler in the ONNX graph (chosen) | Single-file deploy; trust-root sha256 covers calibration; runtime does no preprocessing | One-time export-time wrapping cost | Correct — matches the bundled-scaler design rule; runtime contract stays "feed features, read VMAF" |
Sidecar JSON input_mean / input_std | Smaller ONNX; reuses the existing sidecar layout | Two-file trust contract; runtime needs preprocessing path; drift risk between ONNX and sidecar | Rejected — splits the trust root, adds runtime complexity for no win |
mlp_medium (Phase-3d) | Slightly higher capacity ceiling | Phase-3d arch sweep was inconclusive; 4× param count for no measurable PLCC gain on the validation folds | Rejected — Phase-3d did not produce a positive signal |
| LOSO-final fold weights | Bit-equal to one of the validation folds | Drops 11 % of training data per fold; production deploy benefits from the union | Rejected — LOSO is a methodology, not a deployment recipe |
| Subset-B feature set | Different feature mix tested in Phase-3 | Failed Gate 2 corpus portability (KoNViD + BVI-DVC) — canonical-6 ships positive across all three corpora | Rejected — corpus portability is non-negotiable |
| Re-train on Netflix-only | Bit-equal to v1 methodology | Throws away KoNViD + BVI-DVC coverage | Rejected — the multi-corpus parquet is precisely the asset Phase-2 produced |
Consequences¶
- Positive: ships a tiny VMAF estimator (~1 KB ONNX) with validated +0.005–0.018 PLCC over the prior Subset-B baseline; runtime needs no out-of-band calibration.
- Negative: introduces a second tiny-AI scoring model in the registry — the registry trust contract grows by one entry, and the docs / state matrix track v1 + v2 in parallel until v1 is retired.
- Neutral / follow-ups:
--tiny-modeldefault indocs/ai/inference.mdflips tovmaf_tiny_v2; v1 stays on disk as a regression baseline. PTQ for v2 is deferred (the model is already <2 KB; 8-bit quantisation has no shipping payoff).
References¶
- Research-0027 — feature importance (Phase-2)
- Research-0028 — Phase-3 subset sweep (canonical6 vs A vs B vs C)
- Research-0029 — Phase-3b StandardScaler results
- Research-0030 — Phase-3b multi-seed validation
- Trainer:
ai/scripts/train_vmaf_tiny_v2.py - Exporter:
ai/scripts/export_vmaf_tiny_v2.py - Source:
req(user-provided spec — paraphrased: "Build + shipvmaf_tiny_v2.onnxend-to-end on the validated Phase-3 chain configuration; bundled scaler stats in the graph per the documented design rule.")