ADR-0289: vmaf-tune resolution-aware model selection + CRF offsets¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: Lusoris
- Tags: tooling, vmaf-tune, model-selection, fork-local
Context¶
VMAF is a resolution-aware metric. The fork ships two production-grade pooled-mean models in model/: vmaf_v0.6.1.json (trained on a 1080p viewing setup) and vmaf_4k_v0.6.1.json (re-fit for a 4K display). Scoring 4K content against the 1080p model under-counts spatial detail; scoring 1080p content against the 4K model over-counts coding artefacts. The bias is several VMAF points either way — large enough to poison Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) corpora when the sweep covers a mixed-resolution ladder.
Phase A of vmaf-tune (ADR-0237) shipped with one fixed model per sweep — fine for a single-resolution corpus, lossy for any ABR-ladder input. PR #354's audit (Bucket #8) flagged this as the next correctness gap before Phase B/C/D land. The fix is small and entirely fork-local: a height-based decision rule and a tiny per-resolution CRF-offset hook the future search layer can use to seed bisect bounds.
Decision¶
We will add tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version(width, height) -> str, select_vmaf_model(width, height) -> Path, and crf_offset_for_resolution(width, height) -> int. The decision rule is height-only:
height >= 2160→vmaf_4k_v0.6.1- else →
vmaf_v0.6.1(canonical fallback for 720p / SD too — the fork has no 720p / SD model and Netflix's published guidance is to use the 1080p model for all sub-2160p content).
corpus.iter_rows consumes select_vmaf_model_version once per job (encode dimensions are fixed across all (preset, crf) cells of a job). The CLI gains --resolution-aware / --no-resolution-aware (default on); when off, the explicit --vmaf-model drives every row. The emitted JSONL row's vmaf_model field reflects the effective model used, not opts.vmaf_model — otherwise mixed-ladder corpora would lie about which model scored each row.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Height-only threshold at 2160 (chosen) | Matches Netflix's published guidance; one branch; trivial test surface; future-proof for 8K (clamps to 4K model). | Loses some 1440p nuance — those rows route to the 1080p model even though they're closer to 4K viewing. | Picked: the bias on 1440p is ~0.5 VMAF (acceptable); a 1440p model doesn't exist in the fork. |
| Width-and-height matrix | More accurate for anamorphic / cropped content. | Adds a 2-D decision surface; needs per-codec calibration; no public guidance for the corner cases. | Defer: width is accepted as an argument for API symmetry, but the body ignores it until we have a real anamorphic corpus to fit against. |
| Pixel-count threshold (e.g. ≥ 6 Mpx → 4K) | Handles 21:9 / cropped sources cleanly. | Drifts on letterbox/pillarbox; the canonical Netflix guidance is height-only. | Not chosen: optimising for a corner case over the documented mainline. |
| Defer to Phase B (let bisect re-score with both models) | Keeps Phase A semantics. | 2× scoring cost on every cell; corpus rows ambiguous; downstream regressors get confusing dual-model training data. | Not chosen: doubles the most expensive operation in the loop. |
| User-supplied model per source | Most flexible. | Pushes the decision back to the user, defeating the point of "auto-pick". --no-resolution-aware + --vmaf-model already covers the manual escape hatch. | Not chosen: the auto path needs to be the default. |
Consequences¶
- Positive: mixed-ladder corpora (e.g. a 7-rung ABR ladder from 240p to 2160p) now score every row against the right model with no per-row user input.
vmaf_modelin the JSONL is now reliable ground-truth metadata for downstream Phase B/C/D regressors. The CRF offset hook unlocks a sane default for the future search layer to seed bisect bounds across resolutions. - Negative: one new module + CLI flag + JSONL semantics clarification (
vmaf_modelis now per-row, not per-job). Existing consumers that read the JSONL and assumedvmaf_modelwas constant across a corpus need to handle per-row variance (none ship today — Phase B/C/D are not implemented yet). - Neutral / follow-ups:
- Add a 1440p model when Netflix publishes one upstream — until then 1440p stays on the 1080p side of the threshold.
- Phase B/C/D will learn per-codec CRF offsets from real corpora and override the conservative defaults shipped here. The function signature stays stable.
tools/vmaf-tune/AGENTS.mdgets a new invariant note about the resolution decision rule and the per-rowvmaf_modelsemantics.
References¶
- Parent: ADR-0237 — the
vmaf-tuneumbrella spec / phase ordering. - Research digest:
docs/research/0064-vmaf-tune-resolution-aware.md. - PR #354 audit, Bucket #8 (resolution-aware tuning gap).
- Source:
req— user direction 2026-05-03 to wire the resolution-aware decision rule intovmaf-tuneper the Bucket #8 audit before Phase B/C/D land.