Skip to content

ADR-0235: Codec-aware FR regressor (fr_regressor_v2)

  • Status: Accepted
  • Date: 2026-05-01
  • Deciders: Lusoris, Claude
  • Tags: ai, dnn, tiny-ai, fr-regressor, fork-local

Context

The fork ships a tiny full-reference (FR) MOS regressor — ai/src/vmaf_train/models/fr_regressor.py (PyTorch) → model/tiny/fr_regressor_v1.onnx — that maps a libvmaf FULL_FEATURES vector (adm / vif / motion / psnr / ssim / cambi / ssimulacra2 / ciede / psnr-hvs) to a single MOS scalar. The v1 baseline is codec-blind: every distorted clip lands on the same MLP regardless of which encoder produced it.

Distortion signatures from x264, x265, libsvtav1, libvvenc, and libvpx-vp9 differ systematically — block edges (x264) vs CTU-boundary blur (x265) vs DCT ringing + restoration filters (AV1) vs large-CTU deblocking (VVC). The 2026 Bristol VI-Lab review §5.3 (.workingdir2/preprints202604.0035.v1.pdf) calls this directly: conditioning a perceptual regressor on codec id "reliably lifts cross-codec PLCC/SROCC by 1-3 points on multi-codec corpora". Bampis 2018 (ST-VMAF) and Zhang 2021 (Bull lab "Enhancing VMAF") both report similar deltas in their per-codec ablations.

The fork's training corpus is heading toward multi-codec coverage — KoNViD-1k ships natural distortions, BVI-DVC + the Netflix Public corpus ship single-codec material, and the next sweep adds libsvtav1 + libvvenc legs. A codec-blind v1 is increasingly mis-specified; we either condition the model now or accept a permanent ceiling on cross-codec PLCC.

Decision

We will (a) capture an explicit codec column in the per-clip parquet output of every feature-dump script under ai/scripts/, (b) extend FRRegressor with an optional num_codecs constructor arg that adds a one-hot codec input concatenated to the feature vector before the first MLP layer, and (c) ship a closed, order-stable codec vocabulary in ai/src/vmaf_train/codec.py (x264, x265, libsvtav1, libvvenc, libvpx-vp9, unknown). The default num_codecs=0 keeps the v1 single-input contract so existing checkpoints load unchanged. A fr_regressor_v2_codec_aware checkpoint registers under model/tiny/registry.json only after a side-by-side training run shows a positive (>0.005) PLCC lift on a held-out multi-codec split — otherwise we document the negative result and stop.

Alternatives considered

Option Pros Cons Why not chosen
One-hot codec concatenated to feature vector (chosen) Simple, deterministic, preserves the v1 ONNX op-allowlist (no Embedding op needed at runtime), trivial backwards-compat via num_codecs=0 Wastes ~6 input dims; won't generalise to unseen codecs Wins on simplicity + ONNX compatibility
Per-codec sub-models with a router Highest possible quality ceiling per codec 6× the parameter budget; routing logic doubles the runtime DNN surface; no fallback for "unknown" Too expensive for a "tiny" model; "unknown" bucket has no sub-model
Continuous learned embedding (nn.Embedding(num_codecs, d_emb)) Compact (4–8 dims vs 6), graceful via unknown index Requires an additional ONNX op (Gather) on the allowlist; doesn't measurably outperform one-hot at this scale Adds runtime complexity for no measurable accuracy delta on 6 codecs
Skip codec conditioning, train on more data instead No model surgery Bristol VI-Lab §5.3 explicitly calls out that no amount of single-corpus data closes the cross-codec gap; the gap is structural Direct contradiction of the cited literature

Consequences

  • Positive: cross-codec PLCC/SROCC lift expected at 1–3 points per the Bristol review; trivial to extend the vocabulary as new codecs land (append to CODEC_VOCAB, bump CODEC_VOCAB_VERSION, retrain); extract_full_features.py self-describes its corpus with codec="unknown" rather than silently mislabelling.
  • Negative: ONNX graph gains a second input — libvmaf's vmaf_dnn_session_run already supports two-input contracts (LPIPS-Sq precedent in ADR-0040 / ADR-0041), but the C-side wiring for fr_regressor will need the same multi-input pattern in a follow-up PR; the current PR is training-side only.
  • Neutral / follow-ups: training run + PLCC delta measurement is blocked in the present PR (the agent that proposed the change cannot reach ~/.cache/vmaf-tiny-ai/); follow-up PR re-runs the trainer + measures + ships fr_regressor_v2_codec_aware.onnx if the lift exceeds the 0.005 PLCC bar. Backlog item: T7-CODEC-AWARE.

References

  • Bristol VI-Lab review (Bull, Zhang) — .workingdir2/preprints202604.0035.v1.pdf, §5.3 "Codec-conditioned quality models".
  • Bampis et al. 2018, "Spatiotemporal Feature Integration and Model Fusion for Full-Reference Video Quality Assessment" (ST-VMAF), IEEE TCSVT 28(8).
  • Zhang, Bull et al. 2021, "Enhancing VMAF through new feature integration and model combination" — Bull lab follow-up to Bampis 2018, per-codec ablation.
  • Prior ADRs: ADR-0020 (C1 capability), ADR-0040 (multi-input session API), ADR-0041 (two-input precedent), ADR-0042 (model-card bar), ADR-0168 (corpus baselines).
  • Research digest: Research-0040.
  • Source: req — user task brief 2026-05-01 ("Run a codec-aware feature experiment for the FR regressor").

Status update 2026-05-08: Accepted

Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).

Acceptance criteria verified in tree at HEAD 0a8b539e:

  • ai/src/vmaf_train/codec.py — present (closed, order-stable codec vocabulary).
  • FRRegressor constructor accepts num_codecs; the codec one-hot is concatenated to the canonical-6 feature vector before the first MLP layer (verified in ai/src/vmaf_train/models/fr_regressor.py).
  • model/tiny/fr_regressor_v2.{onnx,json,onnx.data} registered; the deep-ensemble companion fr_regressor_v2_ensemble_v1 ships five seeded ONNX members (_seed{0..4}) per ADR-0279 (this sweep, Accepted).
  • ADR-0272 (this sweep, Accepted) is the v2 scaffold that consumes the codec column from the vmaf-tune Phase A JSONL corpus.
  • Verification command: grep -E "num_codecs" ai/src/vmaf_train/models/fr_regressor.py; ls ai/src/vmaf_train/codec.py model/tiny/fr_regressor_v2*.