ADR-0302: ENCODER_VOCAB v3 — 16-slot schema expansion + retrain plan¶
- Status: Accepted
- Date: 2026-05-05
- Companion research digest: Research-0078
- Related: ADR-0235 (codec-aware FRRegressor + 0.95 LOSO PLCC ship gate), ADR-0272 (
fr_regressor_v2smoke scaffold), ADR-0291 (fr_regressor_v2flip from smoke to production) - Re-scope of: PR #373 (deferred VT-adapters-plus-vocab change; the VT adapters landed via a separate PR — see
tools/vmaf-tune/src/vmaftune/codec_adapters/h264_videotoolbox.py/hevc_videotoolbox.pyon master via ADR-0283)
Context¶
fr_regressor_v2 shipped to production in ADR-0291 against ENCODER_VOCAB v2 (13 slots: libx264, libaom-av1, libx265, h264_nvenc, hevc_nvenc, av1_nvenc, h264_amf, hevc_amf, av1_amf, h264_qsv, hevc_qsv, av1_qsv, libvvenc). Three vmaf-tune codec adapters have landed since:
libsvtav1(PR #294-series, ADR-0294) — software AV1 alongsidelibaom-av1, materially different rate-distortion behaviour at matched CQ + preset.h264_videotoolboxandhevc_videotoolbox(ADR-0283) — Apple hardware-accelerated H.264 / HEVC adapters; first VT family on the corpus side.
The Phase A corpus runner can already emit canonical-6 features for those three encoders, but fr_regressor_v2 has never seen them: the inference path silently maps every unrecognised encoder string to the unknown one-hot column and returns a low-confidence prediction. The cleanest fix is a vocab bump (v2 → v3) plus a fresh LOSO retrain that clears the same 0.95 mean-LOSO-PLCC ship gate ADR-0291 cleared.
This ADR documents the schema expansion as a scaffold-only change. The production ONNX swap is gated on a follow-up retrain PR landing the new checkpoint and clearing the LOSO PLCC ship gate; the in-tree v2 ONNX continues to serve until then. ADR-0235's append-only invariant is preserved — every v2 index keeps its column position; the three new slots append at indices 13/14/15 (one-based: 14/15/16).
Decision¶
Land a 16-slot ENCODER_VOCAB v3 schema scaffold in ai/scripts/train_fr_regressor_v2.py as a parallel constant (ENCODER_VOCAB_V3), without wiring it into the active training pipeline. The live ENCODER_VOCAB and ENCODER_VOCAB_VERSION = 2 remain the source of truth for any retraining run shipping today — this PR ships only the schema definition + the documentation contract that future v3 retrains MUST satisfy.
v3 schema (16 slots, append-only over the user-facing v2 layout documented in ADR-0291):
| idx | slot | family | new in v3 |
|---|---|---|---|
| 0 | libx264 | SW H.264 | — |
| 1 | libaom-av1 | SW AV1 | — |
| 2 | libx265 | SW HEVC | — |
| 3 | h264_nvenc | NVENC H.264 | — |
| 4 | hevc_nvenc | NVENC HEVC | — |
| 5 | av1_nvenc | NVENC AV1 | — |
| 6 | h264_amf | AMF H.264 | — |
| 7 | hevc_amf | AMF HEVC | — |
| 8 | av1_amf | AMF AV1 | — |
| 9 | h264_qsv | QSV H.264 | — |
| 10 | hevc_qsv | QSV HEVC | — |
| 11 | av1_qsv | QSV AV1 | — |
| 12 | libvvenc | SW VVC | — |
| 13 | libsvtav1 | SW AV1 (SVT) | new |
| 14 | h264_videotoolbox | VT H.264 | new |
| 15 | hevc_videotoolbox | VT HEVC | new |
Backwards-compat strategy. Until the v3 ONNX ships and clears the LOSO PLCC ship gate, the runtime continues to load the v2 13-slot ONNX. The v3 schema constant is information-only; no inference path consumes it yet. Once a follow-up retrain PR clears the ship gate, that PR (not this one) flips ENCODER_VOCAB_VERSION from 2 to 3, replaces the live ENCODER_VOCAB tuple, registers the new ONNX in model/tiny/registry.json, and documents the v2 → v3 fallback shim removal.
Ship gate. Mean LOSO PLCC ≥ 0.95 across all 9 Netflix sources, matching the gate ADR-0291 cleared. Per ADR-0235, the multi-codec lift over the v1 single-input regressor must remain ≥ +0.005 PLCC; that floor was already cleared by v2 and is preserved as the v3 acceptance criterion.
Alternatives considered¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| 16-slot retrain (chosen) | Single LOSO run covers all three new codecs; preserves append-only invariant; matches ADR-0291 ship-gate cadence | Requires Phase A corpus coverage for SVT-AV1 + VT (the corpus runner already supports them, no blocker) | Selected — clears the ship gate in one pass, no schema churn |
| Incremental per-PR retrains (one new slot per PR) | Smallest blast-radius per change; easier bisect if a single codec drags PLCC | 3× the LOSO wall-time + 3× the PR overhead; vocab churn invalidates intermediate ONNX checkpoints; users running fr_regressor_v2 would see three back-to-back schema bumps | Rejected — cost-of-PR overhead dominates; no real bisect benefit since LOSO already attributes drag per source × encoder cell |
| Deprecate v2 + retrain from scratch (open vocab, no append-only) | Frees the column ordering; lets us drop unused slots | Breaks ADR-0235's append-only invariant; invalidates every shipped fr_regressor_v2_*.onnx consumer; forces a v3 majeure version bump on every downstream caller | Rejected — append-only is the contract that lets the ONNX checkpoint freeze across vocab edits; abandoning it for a one-time cleanup costs more than the slot waste saves |
| Defer until a "real" multi-corpus lands | Avoids the risk of OldTownCross-style outliers on the new codecs | Holds back vmaf-tune Phase B consumers that already encode SVT-AV1 + VT material; the Phase A corpus runner can already produce canonical-6 rows for these encoders today | Rejected — the corpus is not the bottleneck, the vocab is; deferring blocks usable predictions on shipped adapters |
Consequences¶
- Visible behaviour (this PR): zero. The schema scaffold lands as a parallel constant; existing v2 inference paths are unaffected.
- Visible behaviour (follow-up retrain PR, gated on ship gate):
fr_regressor_v2predictions for SVT-AV1, VT-H.264, and VT-HEVC encodes stop falling through to theunknownone-hot and start receiving codec-aware lift. - Backlog opened: T-FR-V2-VOCAB-V3-RETRAIN — produce Phase A corpus rows for libsvtav1 + the two VT encoders, run LOSO, retrain, ship if ≥ 0.95 mean LOSO PLCC.
- No upstream interaction.
ai/scripts/train_fr_regressor_v2.pyis fork-introduced (ADR-0272); upstream Netflix/vmaf has no equivalent surface.
References¶
- req (2026-05-05, popup re-scope): drop the VT adapters from PR #373 (already landed via ADR-0283), keep only the 13 → 16 vocab expansion + retrain plan; ship as scaffold under a new ADR.
- ADR-0235 — codec-aware FR regressor + LOSO PLCC ship gate + append-only
CODEC_VOCABinvariant. - ADR-0272 —
fr_regressor_v2codec-aware smoke scaffold (smoke checkpoint shipped pending a real Phase A corpus). - ADR-0291 — flip from smoke to production; documents the v2 13-slot vocab and the 0.95 LOSO PLCC ship gate this ADR re-uses.
- ADR-0283 — VT codec adapters that motivate slots 14/15.
- ADR-0294 —
libsvtav1adapter that motivates slot 13. - Research-0078 — companion research digest with retrain plan, ship gate, reproducer.
Status update 2026-05-09: namespace collision resolved¶
Two parallel agent reports (abd6ed552ac8cae60, abda108c8263491da) surfaced a name collision: a future "feature-set v3" workstream (canonical-6 + encoder_internal + shot-boundary + hwcap) was unintentionally referring to itself as fr_regressor_v3 — the same id this ADR's retrain checkpoint already claims. The collision is resolved per ADR-0349: this ADR's fr_regressor_v3 registry row stays bit-identical (sha256 eaa16d23…, smoke: false) and the future feature-set work claims the reserved name fr_regressor_v3plus_features. No code change in this ADR; this appendix lands per ADR-0028 (immutability of Accepted-ADR bodies — append-only status updates only).