ADR-0395: predictor stub-models policy¶
- Status: Accepted
- Date: 2026-05-08
- Deciders: Lusoris
- Tags: ai, vmaf-tune, predictor, models, fork-local
Context¶
The per-shot VMAF predictor (tools/vmaf-tune/src/vmaftune/predictor.py, PR #430) loads one ONNX model per codec adapter at runtime. Without shipped artefacts, every fresh checkout falls back to the analytical curve and Predictor(model_path=...) either raises FileNotFoundError or downstream tests skip the ONNX path entirely. CI cannot pin "the ONNX model loads" without a shipped binary; documentation cannot demonstrate the predict-then-verify flow without a model file to point at.
A "real" per-codec predictor needs a corpus of ~thousands of (features, codec, crf, real_vmaf) rows captured from a representative source set — probe encodes, signalstats, saliency, real VMAF measured with libvmaf — and that corpus is generated on the operator's machine via vmaftune.corpus. It is not something CI can produce on demand.
Decision¶
We will ship a synthetic-stub ONNX model for each of the 14 codec adapters under model/predictor_<codec>.onnx, trained from a deterministic 100-row synthetic corpus seeded by the codec name. The trainer (tools/vmaf-tune/src/vmaftune/predictor_train.py) accepts a real Phase A corpus via --corpus path/to/file.jsonl and falls back to the synthetic generator on a per-codec basis when the corpus is absent or contains zero rows for that codec. Every shipped model card carries corpus.kind: synthetic-stub-N=100 and a prominent "do-not-use-in-production" warning. The runtime predictor surface is unchanged; consumers do not need to know which kind of model they loaded.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Ship synthetic stub per codec (chosen) | Reproducible, byte-stable; CI can pin the load + monotonicity; one-line operator override (--corpus real.jsonl). | Predictions track the analytical fallback, not learned signal. Card warning mandatory. | Best trade-off: CI gets a real ONNX file to load, predictor PR's model_path= branch becomes testable, and the cost when the operator wants real weights is exactly the corpus run they were going to do anyway. |
| Ship one shared stub for all 14 codecs | One file to sign, smaller repo footprint. | Hides per-codec coefficient differences (codec-specific _DEFAULT_COEFFS); operator gets identical predictions across codecs and a misleading impression of fitted weights. | Erases the per-codec contract the runtime predictor depends on. |
| Ship no model files; require operators to train | Zero risk of stub being mistaken for production. | Predictor(model_path=...) is dead code on a fresh checkout; CI can never test the ONNX branch; documentation has no working example. | Drops the predict-then-verify integration test surface. |
| Ship a real corpus + real models | Honest predictions out of the box. | Real corpus run = multi-day encode sweep on the operator's hardware; not feasible for a fork-local PR. Production weights flip is a separate workstream gated on real corpus generation. | Out of scope for this PR; tracked as the production-weights follow-up. |
Consequences¶
- Positive:
Predictor(model_path=...)andpick_crfONNX paths become unit-testable; CI gates the load + clamping + monotonicity per shipped model. The trainer pipeline (synthetic-corpus → ONNX → op-allowlist check → model card) is exercised on every PR. Operators with a real corpus run one command to flip every codec to real weights. - Negative: Future contributors must read the model card before trusting predictions. Stubs add ~300 KB to the repo (14 × ~22 KB). PLCC / SROCC reported in stub cards are artificially high (the synthetic target is the analytical fallback, so the network smooths itself); the warning section flags this.
- Neutral / follow-ups: When a real corpus lands, re-run the trainer and bump the registry / commit the new ONNX bytes. The card writer drops the synthetic-stub warning automatically based on the
corpus.kindfield.
References¶
docs/ai/predictor.md— runtime + training documentation (5-point bar per ADR-0042).- ADR-0042 — per-PR doc bar for tiny-AI surfaces.
- ADR-0237 — vmaf-tune Phase A corpus.
- ADR-0276 — per-shot CRF scaffold the predictor feeds.
tools/vmaf-tune/src/vmaftune/predictor_train.py— trainer.tools/vmaf-tune/src/vmaftune/predictor.py— runtime consumer.- Source:
req— task brief specifies "synthetic 100-row training corpus per codec, deterministic seed".