ADR-0522: --tiny-codec / --tiny-preset / --tiny-crf populate codec one-hot block¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude
- Tags:
cli,ai,dnn,tiny-model
Context¶
ADR-0518 (PR #1278) unblocked the rank-2 feature-vector tiny models shipped under model/tiny/, but stopped short of wiring the codec identity into fr_regressor_v2's second input. The loader pre-seeds the 14-slot codec block to the "unknown" encoder one-hot (index 11 in the v2 vocab, with preset_norm=0 and crf_norm=0) so inference still produces a finite score, but the model never sees the real encoder context. The Consequences section of ADR-0518 explicitly defers the user-facing API to "a future PR".
That follow-up is this ADR. Without it, every vmaf --tiny-model fr_regressor_v2.onnx … invocation returns the same VMAF=42.97 score on the canonical Netflix 576x324 pair regardless of the actual codec used to produce the distorted YUV — the conditioning vector is constant. Downstream tooling (vmaf-tune, the MCP run_tiny_inference probe, the vf_libvmaf filter) cannot benefit from the codec lift documented in ADR-0235 / Research-0040.
The trainer pins the codec-block contract in ai/scripts/train_fr_regressor_v2.py:
ENCODER_VOCAB— closed, append-only, 12 slots today (libx264,libx265,libsvtav1,libvvenc,libvpx-vp9,h264_nvenc,hevc_nvenc,av1_nvenc,h264_qsv,hevc_qsv,av1_qsv,unknown).PRESET_ORDINAL— per-encoder dict mapping preset string to an integer in[0, 9]; the feature is normalised byPRESET_MAX_ORDINAL = 9.0.CRF_MAX = 63— union upper bound across all encoders; the feature iscrf / 63.- Layout:
[encoder_onehot(N_ENCODERS), preset_norm, crf_norm].
The sidecar model/tiny/fr_regressor_v2.json carries encoder_vocab, encoder_vocab_version, and codec_block_layout so the C-side validation can refuse unknown encoder names instead of silently bucketing them into the "unknown" slot.
Decision¶
Add a public vmaf_dnn_set_codec_context(VmafContext *, const char *codec, const char *preset, int crf) entry point in libvmaf/dnn.h and three CLI flags (--tiny-codec, --tiny-preset, --tiny-crf) that call it after vmaf_use_tiny_model. The sidecar loader gains an encoder_vocab[] field parsed from the same JSON, and the loader ships a vmaf_dnn_codec_block_fill helper that builds the one-hot + preset_norm + crf_norm block from the user-supplied parameters.
Concrete pieces:
- Sidecar (
model_loader.h/.c): VmafModelSidecargainsn_encoder_vocab,encoder_vocab[],codec_aware. Parser readsencoder_vocabJSON array using the existingextract_string_arrayhelper; missing = not codec-aware (silent no-op for non-v2 models).vmaf_dnn_codec_block_fill(buf, buf_len, vocab, n_vocab, codec, preset, crf)writes the canonical layout; case-insensitive matching with the sameh264 → libx264/hevc → libx265etc. aliases the trainer uses; per-encoderPRESET_ORDINALtable mirroringtrain_fr_regressor_v2.pylines 169..234.- Public API (
include/libvmaf/dnn.h,src/dnn/dnn_attach_api.c): vmaf_dnn_set_codec_context(ctx, codec, preset, crf)looks up the attached sidecar'sencoder_vocab, callsvmaf_dnn_codec_block_fillintoctx->dnn.extra_in_buf, returns-ENOENTwhen the codec name is non-NULL but unknown,-ENOTSUPwhen the attached model has no codec block (rank-4 image or rank-2 single-input),-EINVALon bad args.- CLI (
tools/cli_parse.{c,h},tools/vmaf.c): - Three new long options.
--tiny-crfaccepts an unsigned integer; the runtime clamps to[0, 63]. configure_tiny_codec_contextruns immediately after the tiny model loads; when the model has a codec block but the user supplied no flags, the pre-seeded "unknown" baseline stays in place (no behaviour change vs ADR-0518).- Tests (
core/test/dnn/test_codec_block.c+test_cli.shextension): - C unit test:
vmaf_dnn_codec_block_fillproduces the right layout for(libx264, medium, 28)and returns-ENOENTforunknown_enc. - CLI smoke test:
--tiny-codec libx264 --tiny-preset medium --tiny-crf 28againstfr_regressor_v2.onnxproduces a VMAF score that differs from the "unknown"-default 42.97.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Auto-probe the codec from the distorted YUV via ffprobe | Zero-config UX | YUV streams carry no codec metadata; the distorted file is raw decoded planes by the time libvmaf sees it; requires a sibling container file path the CLI does not own | Cannot infer codec from the data libvmaf actually receives |
Single combined --tiny-codec-context libx264:medium:28 flag | Fewer flags to remember | Encoding "no preset, just codec+CRF" needs sentinels (e.g. libx264::28) that are hostile to scripting; conflicts with users who already script --tiny-codec as a positional | Three orthogonal flags compose better with shell tooling |
| Make the codec context per-frame (variable-bitrate streams) | Models per-frame codec switches | No shipped model is trained on per-frame codec switches; per-frame would change extra_in_buf lifetime + require buffering | Out of scope; the v2 trainer treats codec as stream-constant |
Read --tiny-codec from an env var (e.g. VMAF_TINY_CODEC) | One less flag in command lines | Hides the parameter in the environment, defeats the documentation goal | CLI surface is the canonical place for inference-time configuration |
| Reject unknown codecs hard instead of bucketing to "unknown" | Catches typos | Breaks valid corpus-tagging use cases where the encoder is genuinely unknown (Netflix Public corpus); contradicts trainer behaviour | Return -ENOENT from the API so callers can choose; the CLI hard-errors only when the user passed a name, never on absence |
Consequences¶
- Positive:
fr_regressor_v2.onnxproduces meaningful per-codec score differentiation for the first time in this fork.- The codec-block layout is documented and validated against the sidecar — typos in encoder names fail with a clear error at attach time, not at training data time.
vmaf-tuneand the MCPrun_tiny_inferenceprobe can now propagate the ladder'sc:vargument straight through to inference without any per-tool plumbing.- The helper
vmaf_dnn_codec_block_filllives inmodel_loader.cnext to the sidecar parser; future tiny models with the same layout (v3 16-slot vocab per ADR-0302) only need to bump theencoder_vocabJSON array and ship a new sidecar — no C re-compile. - Negative:
- The PRESET_ORDINAL table is duplicated between Python (
train_fr_regressor_v2.py) and C (model_loader.c). The AGENTS.md note undercore/src/dnn/flags both sites as a co-edit pair; a future refactor could move the table into the sidecar JSON itself so the C side reads it instead of duplicating. - The "unknown" pre-seed remains the default for callers that omit the flags — this preserves ADR-0518 behaviour for backwards compatibility but means
vmaf --tiny-model fr_regressor_v2.onnxstill returns 42.97 unless the user opts in. - Neutral / follow-ups:
- When ENCODER_VOCAB bumps to v3 (16 slots, ADR-0302), the sidecar-driven validation surface here works unchanged; only the
unknownslot index moves and themodel_loader.cresolution follows the sidecar. - The same API powers future
vmaf-tuneandvf_libvmafplumbing by exposing one stable entry point.
References¶
req: agent dispatch brief "Add--tiny-codec,--tiny-preset,--tiny-crfCLI flags tovmafso codec-aware tiny models (e.g.fr_regressor_v2) get their codec one-hot block populated instead of being pre-seeded to 'unknown' at attach time (per ADR-0518 Consequences)." (2026-05-18).- ADR-0518 — Tiny-model loader accepts external-data and feature-vector ONNX (the predecessor that introduced the pre-seeded "unknown" baseline this ADR replaces).
- ADR-0235 — Codec-aware FR regressor (motivation).
- ADR-0302 —
ENCODER_VOCABv3 16-slot schema expansion (forward compatibility). - ADR-0291 / ADR-0272 —
fr_regressor_v2model card. - ADR-0042 — Tiny-AI docs required per PR (drives the
docs/ai/inference.mdupdate in this PR). - Trainer source of truth:
ai/scripts/train_fr_regressor_v2.py(ENCODER_VOCAB,PRESET_ORDINAL,CRF_MAX,_row_to_features). - Sidecar:
model/tiny/fr_regressor_v2.json(encoder_vocab,encoder_vocab_version,codec_block_layout).