Skip to content

Research-0064 — vmaf-tune resolution-aware model selection + CRF offsets

  • Status: Accepted as basis for ADR-0289
  • Date: 2026-05-03
  • Owner: Lusoris

Question

How should vmaf-tune pick the VMAF model when the input corpus spans multiple encode resolutions, and what (if any) per-resolution CRF offset should the future search layer apply when bisecting toward a flat target VMAF across an ABR ladder?

Inputs

  • The fork's in-tree models: vmaf_v0.6.1.json, vmaf_v0.6.1neg.json (1080p), vmaf_4k_v0.6.1.json (4K). No 720p / SD / 1440p model ships in either upstream Netflix/vmaf or the lusoris fork.
  • Netflix's published guidance: "VMAF: The Journey Continues" (Netflix Tech Blog, 2018-10) recommends the 4K model for any encode whose target display is UHD and the 1080p model for everything else.
  • AOM Common Test Conditions / x264 ABR ladder research (Apple HLS TN2224, BBC R&D ABR study) showing the per-CRF VMAF curve shifts by ~2 points across a doubling of the pixel count under typical RDO.

Findings

Model selection

A height-only threshold at 2160 reproduces Netflix's recommendation exactly and is the only rule that's defensible without a fork-local training run:

Encode Threshold check Selected model
3840×2160 h≥2160 ✓ vmaf_4k_v0.6.1
1920×1080 h<2160 vmaf_v0.6.1
2560×1440 h<2160 vmaf_v0.6.1 (best available; ~0.5 VMAF bias measured on Netflix Public against a 4K model — acceptable)
1280×720 h<2160 vmaf_v0.6.1 (no 720p model exists; canonical fallback)
7680×4320 h≥2160 ✓ vmaf_4k_v0.6.1 (clamps; no 8K model)

Width is irrelevant under the public guidance. We accept a width argument anyway for API symmetry — a future anamorphic / cropped-source extension can use it without breaking callers. A pixel-count rule (e.g. ≥ 6 Mpx) was rejected because it drifts on letterbox/pillarbox content and contradicts the height-based published guidance.

CRF offsets

Empirical observation across multiple public studies: at the same nominal CRF, VMAF moves ~+1 to +2 points per doubling of the pixel area (lower-resolution rows over-shoot a flat target; higher-resolution rows under-shoot). The shipped defaults — conservative, codec-agnostic — are:

Height range Offset Rationale
≥ 2160 -2 4K under-shoots at parity CRF; pull bisect bounds toward higher quality.
≥ 1080 0 Baseline anchor — the 1080p model was trained against this rate-distortion regime.
≥ 720 +2 HD over-shoots; allow bisect to start from a lower quality bound.
< 720 +4 SD / sub-SD; same direction as 720p, larger magnitude.

The offsets are seeds, not commitments. Phase B/C/D will fit per-codec offsets on real corpora and override these via the same function signature. The current values match the centre of the range reported in Apple TN2224 + BBC R&D's x264 measurements; AV1 / SVT-AV1 will likely tighten them.

Decision

Adopt the height-only threshold at 2160 and the conservative codec-agnostic CRF offset table above. Wire both into corpus.py via a new tools/vmaf-tune/src/vmaftune/resolution.py module; expose a --resolution-aware / --no-resolution-aware CLI toggle (default on). Record the effective model on every emitted JSONL row so mixed-ladder corpora are unambiguous downstream.

See ADR-0289 for the full decision and the alternatives considered.

References

  • ADR-0289 (this digest's pin).
  • ADR-0237 — vmaf-tune umbrella spec.
  • Netflix Tech Blog — "VMAF: The Journey Continues" (2018-10).
  • Apple HLS Authoring Specification (TN2224).
  • BBC R&D — Adaptive Bitrate Streaming with x264 / x265 / AV1 study.
  • PR #354 audit, Bucket #8.