Research-0064 — vmaf-tune resolution-aware model selection + CRF offsets¶
- Status: Accepted as basis for ADR-0289
- Date: 2026-05-03
- Owner: Lusoris
Question¶
How should vmaf-tune pick the VMAF model when the input corpus spans multiple encode resolutions, and what (if any) per-resolution CRF offset should the future search layer apply when bisecting toward a flat target VMAF across an ABR ladder?
Inputs¶
- The fork's in-tree models:
vmaf_v0.6.1.json,vmaf_v0.6.1neg.json(1080p),vmaf_4k_v0.6.1.json(4K). No 720p / SD / 1440p model ships in either upstream Netflix/vmaf or the lusoris fork. - Netflix's published guidance: "VMAF: The Journey Continues" (Netflix Tech Blog, 2018-10) recommends the 4K model for any encode whose target display is UHD and the 1080p model for everything else.
- AOM Common Test Conditions / x264 ABR ladder research (Apple HLS TN2224, BBC R&D ABR study) showing the per-CRF VMAF curve shifts by ~2 points across a doubling of the pixel count under typical RDO.
Findings¶
Model selection¶
A height-only threshold at 2160 reproduces Netflix's recommendation exactly and is the only rule that's defensible without a fork-local training run:
| Encode | Threshold check | Selected model |
|---|---|---|
| 3840×2160 | h≥2160 ✓ | vmaf_4k_v0.6.1 |
| 1920×1080 | h<2160 | vmaf_v0.6.1 |
| 2560×1440 | h<2160 | vmaf_v0.6.1 (best available; ~0.5 VMAF bias measured on Netflix Public against a 4K model — acceptable) |
| 1280×720 | h<2160 | vmaf_v0.6.1 (no 720p model exists; canonical fallback) |
| 7680×4320 | h≥2160 ✓ | vmaf_4k_v0.6.1 (clamps; no 8K model) |
Width is irrelevant under the public guidance. We accept a width argument anyway for API symmetry — a future anamorphic / cropped-source extension can use it without breaking callers. A pixel-count rule (e.g. ≥ 6 Mpx) was rejected because it drifts on letterbox/pillarbox content and contradicts the height-based published guidance.
CRF offsets¶
Empirical observation across multiple public studies: at the same nominal CRF, VMAF moves ~+1 to +2 points per doubling of the pixel area (lower-resolution rows over-shoot a flat target; higher-resolution rows under-shoot). The shipped defaults — conservative, codec-agnostic — are:
| Height range | Offset | Rationale |
|---|---|---|
| ≥ 2160 | -2 | 4K under-shoots at parity CRF; pull bisect bounds toward higher quality. |
| ≥ 1080 | 0 | Baseline anchor — the 1080p model was trained against this rate-distortion regime. |
| ≥ 720 | +2 | HD over-shoots; allow bisect to start from a lower quality bound. |
| < 720 | +4 | SD / sub-SD; same direction as 720p, larger magnitude. |
The offsets are seeds, not commitments. Phase B/C/D will fit per-codec offsets on real corpora and override these via the same function signature. The current values match the centre of the range reported in Apple TN2224 + BBC R&D's x264 measurements; AV1 / SVT-AV1 will likely tighten them.
Decision¶
Adopt the height-only threshold at 2160 and the conservative codec-agnostic CRF offset table above. Wire both into corpus.py via a new tools/vmaf-tune/src/vmaftune/resolution.py module; expose a --resolution-aware / --no-resolution-aware CLI toggle (default on). Record the effective model on every emitted JSONL row so mixed-ladder corpora are unambiguous downstream.
See ADR-0289 for the full decision and the alternatives considered.
References¶
- ADR-0289 (this digest's pin).
- ADR-0237 —
vmaf-tuneumbrella spec. - Netflix Tech Blog — "VMAF: The Journey Continues" (2018-10).
- Apple HLS Authoring Specification (TN2224).
- BBC R&D — Adaptive Bitrate Streaming with x264 / x265 / AV1 study.
- PR #354 audit, Bucket #8.