ADR-0323: fr_regressor_v3 — train + register on ENCODER_VOCAB v3 (16-slot)¶
- Status: Accepted
- Date: 2026-05-06
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ai, fr-regressor, codec-aware, encoder-vocab, loso, fork-local
- Related: ADR-0302 (v3 16-slot schema scaffold + ship gate), ADR-0291 (v2 production-flip; defines the 0.95 LOSO PLCC ship gate), ADR-0235 (codec-aware FR regressor decision; ≥+0.005 PLCC multi-codec lift floor), ADR-0319 (LOSO trainer pattern reused), Research-0078 (v3 schema expansion plan + retrain checklist)
Context¶
PR #401 (ADR-0302) shipped ENCODER_VOCAB_V3 as a parallel constant in ai/scripts/train_fr_regressor_v2.py — 16 slots, append-only extension of the production v2 vocab to cover three vmaf-tune codec adapters that landed since fr_regressor_v2 flipped to production (libsvtav1, h264_videotoolbox, hevc_videotoolbox). The live ENCODER_VOCAB_VERSION = 2 remained authoritative.
The Phase A canonical-6 corpus (runs/phase_a/full_grid/per_frame_canonical6.jsonl, 5,640 rows, NVENC-only) is now generated locally via scripts/dev/hw_encoder_corpus.py. The LOSO trainer pattern that PR #422 (ADR-0319) implemented for the v2 ensemble is the natural recipe to reuse for a v3 single-checkpoint training run gated on the ADR-0302 / ADR-0291 ship floor (mean LOSO PLCC ≥ 0.95).
This PR closes the v3 retrain deferral by training a fr_regressor_v3.onnx checkpoint and registering it as a parallel model alongside fr_regressor_v2.
Decision¶
Train fr_regressor_v3 on the existing NVENC-only Phase A corpus using the v2 LOSO recipe; ship the model only if mean LOSO PLCC ≥ 0.95. Do not flip ENCODER_VOCAB_VERSION from 2 to 3 in the v2 trainer — that's a separate "promote v3 to authoritative" PR.
Concretely:
-
Add
ai/scripts/train_fr_regressor_v3.py. The trainer importsENCODER_VOCAB_V3from the v2 trainer (the constant landed in PR #401) and consumes the same Phase A canonical-6 JSONL the v2 ensemble LOSO trainer (ADR-0319) consumes. Pipeline mirrorstrain_fr_regressor_v2_ensemble_loso._load_corpus/_train_one_seed/_predict_fold, with three structural changes: -
16-slot one-hot (
ENCODER_VOCAB_V3) instead of v2's 12-slot. - 18-D codec block (
16 + preset_norm + crf_norm) — the--num-codecsdefault isCODEC_BLOCK_DIM = 18. -
Output tensor renamed
score → vmafto match the corpus teacher-score column. -
Run the 9-fold LOSO gate at training time. If mean LOSO PLCC < 0.95, the trainer exits non-zero and writes the sidecar / registry row with
gate_passed: false+smoke: trueso the PR ships the scaffold + audit trail without claiming production status. On gate-pass, the trainer fits a final full-corpus checkpoint and exports it tomodel/tiny/fr_regressor_v3.onnx(opset 17, two-inputfeatures:[N,6]+codec_block:[N,18]→vmaf:[N]). -
Sidecar JSON
model/tiny/fr_regressor_v3.jsonmirrors thefr_regressor_v2.jsonshape with three additions:encoder_vocab_version: 3,corpus,corpus_sha256,loso_mean_plcc,gate_passed, full per-fold LOSO trace. -
Registry entry
fr_regressor_v3lands as a new row keyed alphabetically betweenfr_regressor_v2_ensemble_v1_seed4andlearned_filter_v1.smoke: falseon gate-pass;smoke: trueon gate-fail with an explicit "v3 retrain pending" note. -
Tests under
ai/tests/test_train_fr_regressor_v3.pycover the loader contract on a 16-row synthetic fixture exercising 4 of the 16 vocab slots (libx264,h264_nvenc,libsvtav1,hevc_videotoolbox), a 1-epoch full-corpus fit + ONNX export round-trip, and the LOSO summary schema. CPU-only, sub-second. -
Doc card
docs/ai/models/fr_regressor_v3.mddocuments the 18-D codec block, the headline LOSO results, and — honestly — the NVENC-only corpus caveat (the 15 non-NVENC slots receive zero training rows and degrade to v1-baseline behaviour at inference time). -
The ADR-0235 multi-codec lift floor (≥+0.005 PLCC over v1) is deferred as not-yet-measurable on this corpus drop: the NVENC-only corpus reduces v3-vs-v1 lift to a v1-vs-v1 comparison on a single codec. The follow-up PR that gates production-flip of
fr_regressor_v3.onnxoverfr_regressor_v2.onnxwill measure the lift on a multi-codec corpus drop.
Headline result¶
Gate PASS. Mean LOSO PLCC = 0.9975 ± 0.0018 across the 9 Netflix sources, every source clears 0.99. Min/max PLCC spread 0.9945 → 0.9996 (well under ADR-0303's 0.005 ensemble-spread bound). Full per-fold trace lives in docs/ai/models/fr_regressor_v3.md §Headline results and in the sidecar JSON training.loso_folds. Registry row lands with smoke: false.
Alternatives considered¶
1. Train now on the NVENC-only corpus (chosen)¶
Pros: closes ADR-0302 deferral; demonstrates the v3 schema works end-to-end with real-weight ONNX; unblocks downstream work that wants a v3 sidecar to test against; sidecar format is the forward-compatible shape future multi-codec retrains will adopt. Cons: 15 of 16 vocab slots receive zero training data, so v3's behaviour on non-NVENC codecs is essentially v1-baseline. The ADR-0235 multi-codec lift floor is unmeasurable on this corpus.
2. Wait for a multi-codec corpus drop¶
Pros: would let v3 train against ≥3 codec families and measure the multi-codec lift floor honestly; matches the spirit of ADR-0235's "never silently default to a codec that doesn't match what the script actually encoded" rule. Cons: the multi-codec corpus drop has been blocking on Phase A runner expansion since PR #283 landed (~3 months); v3 has been in scaffold-only limbo since PR #401. The corpus blocker is independent of the v3 schema scaffold; gating one on the other compounds backlog. Mitigation chosen instead: ship v3 now with explicit caveats; defer the production-flip (v2.onnx → v3.onnx slot reassignment) until a multi-codec corpus exists.
3. Scaffold-only with smoke: true¶
Pros: zero risk of shipping a model that misleads consumers about its multi-codec coverage. Cons: identical to PR #401's status quo; ships no new artefact for downstream PRs to consume; no LOSO gate result captured. Rejected because the trainer's gate gives us a real PLCC number to anchor future retrain comparisons against, even on a single-codec corpus.
Consequences¶
- A new
fr_regressor_v3.onnxships undermodel/tiny/, registered withsmoke: false. The libvmaf DNN runtime can load it via the registry but has no caller wired to consume it yet — the production consumer (vmaf-tune fast) still consumesfr_regressor_v2.onnxper ADR-0304. fr_regressor_v2.onnxandfr_regressor_v2.jsonare untouched (per the task brief constraint). The append-only vocab invariant is honoured.- The v3 16-slot schema is now exercised end-to-end against a real ONNX export + ORT round-trip. Any downstream PR that wants to validate v3-shape inference can use this checkpoint.
- The NVENC-only caveat is documented in the model card, sidecar notes, and this ADR. The follow-up retrain PR (gated on a multi-codec corpus) will lift the caveat.
Reproducer¶
# Real corpus (the Phase A NVENC-only drop):
python ai/scripts/train_fr_regressor_v3.py \
--corpus runs/phase_a/full_grid/per_frame_canonical6.jsonl
# Smoke (synthetic, sub-second):
python ai/scripts/train_fr_regressor_v3.py --smoke
# Tests + registry verification:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh
References¶
req2026-05-06: "Open a draft PR titledfeat(ai): fr_regressor_v3 — train + register on ENCODER_VOCAB v3 (16-slot)againstVMAFx/vmafxmaster." (paraphrased; this PR's task brief).- ADR-0302 §Retrain ship gate — re-uses ADR-0291's mean LOSO PLCC ≥ 0.95 floor.
- ADR-0291 §Decision — baseline v2 production-flip cleared at 0.9681.
- ADR-0235 §References — append-only vocab invariant + +0.005 PLCC multi-codec lift floor.
- ADR-0319 §Decision — LOSO recipe (Adam(lr=5e-4, wd=1e-5), 200 epochs, bs=32, fold-local StandardScaler).
- Research-0078 §Production-flip checklist — items this PR's follow-up retrain PR is responsible for (the v2 → v3 in-place swap).