Skip to content

ADR-0244: vmaf_tiny_v2 — canonical-6 + StandardScaler tiny VMAF MLP

  • Status: Accepted
  • Date: 2026-04-29
  • Deciders: Lusoris, Claude (Anthropic)
  • Tags: ai, dnn, tiny-ai, model, registry, fork-local

Context

The shipped vmaf_tiny_v1.onnx was trained on the Netflix corpus alone with a single train/val split (val=Tennis); it had no scaler stats baked in and no end-to-end provenance to the FULL-feature parquet that the Phase-3 research chain produced. The Phase-3 chain (Research-0027 → 0028 → 0029 → 0030) validated a concrete configuration on Netflix LOSO + KoNViD 5-fold:

  • Architecture: mlp_small (6 → 16 → 8 → 1, ~257 params). Phase-3d's arch sweep was inconclusive against mlp_medium, so the small variant remains the v2 baseline.
  • Features: canonical-6 = (adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2).
  • Preprocessing: per-fold StandardScaler (fit on train, applied to val).
  • Optimiser: Adam @ lr=1e-3, MSE loss, 90 epochs, batch_size 256.
  • Validated PLCC: 0.9978 ± 0.0021 Netflix LOSO, 0.9998 KoNViD 5-fold; +0.005–0.018 PLCC over the prior Subset-B baseline.

The v2 model needs to ship these gains, and it needs to ship them in a way the runtime can consume without requiring the caller to know the scaler stats out-of-band.

Decision

We ship vmaf_tiny_v2.onnx with the validated Phase-3 configuration and bake the StandardScaler (mean, std) directly into the ONNX graph as Sub + Div Constant nodes that run before the MLP. The runtime feeds raw canonical-6 feature values; the trust-root sha256 covers the calibration values too. opset 17 (matches learned_filter_v1, nr_metric_v1, fastdvdnet_pre). For the production export we fit the scaler on the full 3-corpus parquet (Netflix + KoNViD + BVI-DVC D+C) — Phase-3 LOSO + 5-fold are the validation methodology; the shipped weights are trained on the union for maximum corpus coverage. Registered as vmaf_tiny_v2 (kind fr) in model/tiny/registry.json with quant_mode: fp32 and smoke: false.

Alternatives considered

Option Pros Cons Why not chosen
Bundled scaler in the ONNX graph (chosen) Single-file deploy; trust-root sha256 covers calibration; runtime does no preprocessing One-time export-time wrapping cost Correct — matches the bundled-scaler design rule; runtime contract stays "feed features, read VMAF"
Sidecar JSON input_mean / input_std Smaller ONNX; reuses the existing sidecar layout Two-file trust contract; runtime needs preprocessing path; drift risk between ONNX and sidecar Rejected — splits the trust root, adds runtime complexity for no win
mlp_medium (Phase-3d) Slightly higher capacity ceiling Phase-3d arch sweep was inconclusive; 4× param count for no measurable PLCC gain on the validation folds Rejected — Phase-3d did not produce a positive signal
LOSO-final fold weights Bit-equal to one of the validation folds Drops 11 % of training data per fold; production deploy benefits from the union Rejected — LOSO is a methodology, not a deployment recipe
Subset-B feature set Different feature mix tested in Phase-3 Failed Gate 2 corpus portability (KoNViD + BVI-DVC) — canonical-6 ships positive across all three corpora Rejected — corpus portability is non-negotiable
Re-train on Netflix-only Bit-equal to v1 methodology Throws away KoNViD + BVI-DVC coverage Rejected — the multi-corpus parquet is precisely the asset Phase-2 produced

Consequences

  • Positive: ships a tiny VMAF estimator (~1 KB ONNX) with validated +0.005–0.018 PLCC over the prior Subset-B baseline; runtime needs no out-of-band calibration.
  • Negative: introduces a second tiny-AI scoring model in the registry — the registry trust contract grows by one entry, and the docs / state matrix track v1 + v2 in parallel until v1 is retired.
  • Neutral / follow-ups: --tiny-model default in docs/ai/inference.md flips to vmaf_tiny_v2; v1 stays on disk as a regression baseline. PTQ for v2 is deferred (the model is already <2 KB; 8-bit quantisation has no shipping payoff).

References

  • Research-0027 — feature importance (Phase-2)
  • Research-0028 — Phase-3 subset sweep (canonical6 vs A vs B vs C)
  • Research-0029 — Phase-3b StandardScaler results
  • Research-0030 — Phase-3b multi-seed validation
  • Trainer: ai/scripts/train_vmaf_tiny_v2.py
  • Exporter: ai/scripts/export_vmaf_tiny_v2.py
  • Source: req (user-provided spec — paraphrased: "Build + ship vmaf_tiny_v2.onnx end-to-end on the validated Phase-3 chain configuration; bundled scaler stats in the graph per the documented design rule.")