Models (v1)¶
As of June 2026, this repository ships a new generation of VMAF models, referred to as VMAF v1. These models have demonstrated better accuracy compared to the previous set of models (VMAF v0). For the previous generation of models, see overview.md.
The v1 models are ported verbatim from Netflix upstream (commit 4718b4f5f). They live under model/vmaf_v1.0.16/ (and model/vmaf_v1.0.16_hfr/ for the high-frame-rate variants) and are selected via the --model option, in the same way as the v0 models.
Overview of VMAF v1 models¶
VMAF v1 supports the following models:
- Standard 1080p Model: Calibrated for 1080p video viewed at a standard 3H distance. It uses an operating range of [0, 100].
- Phone Model: Derived by setting the normalized viewing distance to 5H (based on experimental data), this model adjusts the DLM, AIM, and chroma feature calculations to reflect reduced artifact visibility on smaller screens viewed from a greater relative distance. It retains the standard [0, 100] range.
- 4K Model: Two v1 4K models are released:
- A 1.5H variant, based on a discerning 4K@1.5H viewing condition. This variant is conceptually similar to its v0 4K counterpart and operates on a [0, 100] range. For most users, this variant is the default choice.
- A 3H variant, based on a consumer-like 4K@3H viewing condition. This variant operates on a [0, 110] range, which helps to quantify the additional perceptual benefit of 4K resolution over 1080p when both are viewed at 3H.
VMAF v1 should ideally be applied at 10-bit precision for SDR, which helps more accurately capture the presence of banding. Even if the encoded video is 8-bit, VMAF can still be measured at 10 bits by appropriately preprocessing both video inputs.
Which model file should I use?¶
| Scenario | Display | Normalized viewing distance | Model file | Score range |
|---|---|---|---|---|
| Standard 1080p | 1080p | 3H | model/vmaf_v1.0.16/vmaf_v1.0.16_3d0h.json | [0, 100] |
| Phone | 1080p | 5H | model/vmaf_v1.0.16/vmaf_v1.0.16_5d0h.json | [0, 100] |
| 4K default | 2160p | 1.5H | model/vmaf_v1.0.16/vmaf_v1.0.16_1d5h_2160.json | [0, 100] |
| 4K consumer TV | 2160p | 3H | model/vmaf_v1.0.16/vmaf_v1.0.16_3d0h_2160.json | [0, 110] |
Each model is also embedded as a built-in (-Dbuilt_in_models=true, the default), so it can be selected by version= without a file path:
vmaf \
--reference ref.yuv \
--distorted dis.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 10 \
--model version=vmaf_v1.0.16_3d0h \
--output output.xml
Example invocation using an explicit model path:
vmaf \
--reference ref.yuv \
--distorted dis.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 10 \
--model path=model/vmaf_v1.0.16/vmaf_v1.0.16_3d0h.json \
--output output.xml
Specifying encode-side parameters¶
If the encode-side width, height, and bit depth of the distorted video are known (i.e., the dimensions and bit depth at which the video was actually encoded, before any rescaling for display), they can be passed to the CAMBI feature used by VMAF as additional --model parameters. For example, if we have a 1280x720 8-bit encoded video that is displayed on a 1080p display, we measure VMAF at a 1920x1080 resolution as follows:
vmaf \
--reference ref.yuv \
--distorted dis.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 10 \
--model path=model/vmaf_v1.0.16/vmaf_v1.0.16_3d0h.json:cambi.enc_width=1280:cambi.enc_height=720:cambi.enc_bitdepth=8 \
--output output.xml
These overrides are merged into the model's CAMBI feature options, so the CAMBI instance evaluated by VMAF is the one that takes them into account — no separate CAMBI instance is registered. When enc_width/enc_height/enc_bitdepth are not provided, CAMBI falls back to the input width/height and bitdepth.
High-frame-rate (HFR) content¶
For high-frame-rate (HFR) content, VMAF v1 provides _hfr model variants under model/vmaf_v1.0.16_hfr/. The model-selection logic above carries over identically — just substitute the _hfr directory and filename.
In the context of these models, HFR refers to frame rates roughly double the common 24/30 fps cases (e.g., 60 fps); the _hfr variants are calibrated for the ~50/60 fps regime ("HFR" can refer to higher frame rates in some applications). Compared to the standard models, they use a wider, five-frame temporal motion window (differencing over frames i-2, i, i+2) with moving-average smoothing, which reduces the quality under-prediction that v0 showed at higher frame rates without inflating scores from the denser inter-frame signal.
HFR handling remains an area of active improvement — the current variants don't fully capture the perceptual impact of high frame rates, and Netflix expects to refine it in future releases.
Fork status (VMAFX-specific)¶
The four SDR (non-HFR) v1.0.16 models — vmaf_v1.0.16_3d0h, vmaf_v1.0.16_3d0h_2160, vmaf_v1.0.16_5d0h, and vmaf_v1.0.16_1d5h_2160 — are fully supported on the fork's CPU path and score correctly (the 1080p 3H model reproduces the upstream golden VMAF of 82.816059 on the Netflix src01 reference pair).
The four _hfr variants embed and register, but cannot yet be scored on this fork. They require motion_five_frame_window=true + motion_moving_average=true on the motion feature, and the prev_prev_ref 5-frame plumbing that this mode needs is not yet implemented — see ADR-0337. Attempting to run an _hfr model currently fails with:
The HFR models are shipped now so that the model data is in place; enabling them is tracked as follow-up work (see docs/state.md, T-UPSTREAM-V1.0.16-MODELS-2026-06-20).
GPU and SIMD acceleration (fork-specific)¶
The supported (non-HFR) v1 models run on the fork's CUDA and SYCL backends and on AVX2 / AVX-512 / NEON SIMD, subject to the usual caveat that only the CPU scalar fixed-point path is the archival reference the Netflix golden-data checkpoints assert against. See overview.md for the full backend-parity discussion.