ADR-0235: Codec-aware FR regressor (fr_regressor_v2)¶
- Status: Accepted
- Date: 2026-05-01
- Deciders: Lusoris, Claude
- Tags:
ai,dnn,tiny-ai,fr-regressor,fork-local
Context¶
The fork ships a tiny full-reference (FR) MOS regressor — ai/src/vmaf_train/models/fr_regressor.py (PyTorch) → model/tiny/fr_regressor_v1.onnx — that maps a libvmaf FULL_FEATURES vector (adm / vif / motion / psnr / ssim / cambi / ssimulacra2 / ciede / psnr-hvs) to a single MOS scalar. The v1 baseline is codec-blind: every distorted clip lands on the same MLP regardless of which encoder produced it.
Distortion signatures from x264, x265, libsvtav1, libvvenc, and libvpx-vp9 differ systematically — block edges (x264) vs CTU-boundary blur (x265) vs DCT ringing + restoration filters (AV1) vs large-CTU deblocking (VVC). The 2026 Bristol VI-Lab review §5.3 (.workingdir2/preprints202604.0035.v1.pdf) calls this directly: conditioning a perceptual regressor on codec id "reliably lifts cross-codec PLCC/SROCC by 1-3 points on multi-codec corpora". Bampis 2018 (ST-VMAF) and Zhang 2021 (Bull lab "Enhancing VMAF") both report similar deltas in their per-codec ablations.
The fork's training corpus is heading toward multi-codec coverage — KoNViD-1k ships natural distortions, BVI-DVC + the Netflix Public corpus ship single-codec material, and the next sweep adds libsvtav1 + libvvenc legs. A codec-blind v1 is increasingly mis-specified; we either condition the model now or accept a permanent ceiling on cross-codec PLCC.
Decision¶
We will (a) capture an explicit codec column in the per-clip parquet output of every feature-dump script under ai/scripts/, (b) extend FRRegressor with an optional num_codecs constructor arg that adds a one-hot codec input concatenated to the feature vector before the first MLP layer, and (c) ship a closed, order-stable codec vocabulary in ai/src/vmaf_train/codec.py (x264, x265, libsvtav1, libvvenc, libvpx-vp9, unknown). The default num_codecs=0 keeps the v1 single-input contract so existing checkpoints load unchanged. A fr_regressor_v2_codec_aware checkpoint registers under model/tiny/registry.json only after a side-by-side training run shows a positive (>0.005) PLCC lift on a held-out multi-codec split — otherwise we document the negative result and stop.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| One-hot codec concatenated to feature vector (chosen) | Simple, deterministic, preserves the v1 ONNX op-allowlist (no Embedding op needed at runtime), trivial backwards-compat via num_codecs=0 | Wastes ~6 input dims; won't generalise to unseen codecs | Wins on simplicity + ONNX compatibility |
| Per-codec sub-models with a router | Highest possible quality ceiling per codec | 6× the parameter budget; routing logic doubles the runtime DNN surface; no fallback for "unknown" | Too expensive for a "tiny" model; "unknown" bucket has no sub-model |
Continuous learned embedding (nn.Embedding(num_codecs, d_emb)) | Compact (4–8 dims vs 6), graceful via unknown index | Requires an additional ONNX op (Gather) on the allowlist; doesn't measurably outperform one-hot at this scale | Adds runtime complexity for no measurable accuracy delta on 6 codecs |
| Skip codec conditioning, train on more data instead | No model surgery | Bristol VI-Lab §5.3 explicitly calls out that no amount of single-corpus data closes the cross-codec gap; the gap is structural | Direct contradiction of the cited literature |
Consequences¶
- Positive: cross-codec PLCC/SROCC lift expected at 1–3 points per the Bristol review; trivial to extend the vocabulary as new codecs land (append to
CODEC_VOCAB, bumpCODEC_VOCAB_VERSION, retrain);extract_full_features.pyself-describes its corpus withcodec="unknown"rather than silently mislabelling. - Negative: ONNX graph gains a second input — libvmaf's
vmaf_dnn_session_runalready supports two-input contracts (LPIPS-Sq precedent in ADR-0040 / ADR-0041), but the C-side wiring forfr_regressorwill need the same multi-input pattern in a follow-up PR; the current PR is training-side only. - Neutral / follow-ups: training run + PLCC delta measurement is blocked in the present PR (the agent that proposed the change cannot reach
~/.cache/vmaf-tiny-ai/); follow-up PR re-runs the trainer + measures + shipsfr_regressor_v2_codec_aware.onnxif the lift exceeds the 0.005 PLCC bar. Backlog item: T7-CODEC-AWARE.
References¶
- Bristol VI-Lab review (Bull, Zhang) —
.workingdir2/preprints202604.0035.v1.pdf, §5.3 "Codec-conditioned quality models". - Bampis et al. 2018, "Spatiotemporal Feature Integration and Model Fusion for Full-Reference Video Quality Assessment" (ST-VMAF), IEEE TCSVT 28(8).
- Zhang, Bull et al. 2021, "Enhancing VMAF through new feature integration and model combination" — Bull lab follow-up to Bampis 2018, per-codec ablation.
- Prior ADRs: ADR-0020 (C1 capability), ADR-0040 (multi-input session API), ADR-0041 (two-input precedent), ADR-0042 (model-card bar), ADR-0168 (corpus baselines).
- Research digest: Research-0040.
- Source:
req— user task brief 2026-05-01 ("Run a codec-aware feature experiment for the FR regressor").
Status update 2026-05-08: Accepted¶
Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).
Acceptance criteria verified in tree at HEAD 0a8b539e:
ai/src/vmaf_train/codec.py— present (closed, order-stable codec vocabulary).FRRegressorconstructor acceptsnum_codecs; the codec one-hot is concatenated to the canonical-6 feature vector before the first MLP layer (verified inai/src/vmaf_train/models/fr_regressor.py).model/tiny/fr_regressor_v2.{onnx,json,onnx.data}registered; the deep-ensemble companionfr_regressor_v2_ensemble_v1ships five seeded ONNX members (_seed{0..4}) per ADR-0279 (this sweep, Accepted).- ADR-0272 (this sweep, Accepted) is the v2 scaffold that consumes the codec column from the vmaf-tune Phase A JSONL corpus.
- Verification command:
grep -E "num_codecs" ai/src/vmaf_train/models/fr_regressor.py; ls ai/src/vmaf_train/codec.py model/tiny/fr_regressor_v2*.