vmaf — command-line reference¶
vmaf is the main CLI binary shipped with this fork. It takes a reference / distorted video pair, runs one or more VMAF models (plus any additional feature extractors), and writes per-frame + pooled scores to an XML / JSON / CSV / subtitle log.
Scope. This page is the canonical flag reference for the
vmafbinary in the VMAFx fork. It supersedes the abbreviated help string incore/tools/README.md— the code's--helpis authoritative for the set of flags at any given commit; this page adds defaults, interactions, and runnable examples per ADR-0100.For the
vmaf_benchmicro-benchmark binary, see bench.md. For FFmpeg integration (thelibvmaffilter), see ffmpeg.md. For the Python bindings, see python.md.
Quick start¶
# .y4m pair — no geometry flags needed
vmaf --reference ref.y4m --distorted dist.y4m
# .yuv pair — geometry is mandatory
vmaf \
--reference ref.yuv \
--distorted dist.yuv \
--width 1920 --height 1080 \
--pixel_format 420 --bitdepth 8 \
--output scores.xml
Default behaviour when no --model is passed: the built-in vmaf_v0.6.1 model is loaded automatically. Default output format when no --xml|--json|--csv|--sub is passed: XML.
Required input flags¶
| Flag | Short | Argument | Required | Notes |
|---|---|---|---|---|
--reference | -r | path | yes | .y4m or .yuv path. |
--distorted | -d | path | yes | .y4m or .yuv path. |
--width | -w | unsigned | yes for .yuv | Ignored for .y4m (embedded). |
--height | -h | unsigned | yes for .yuv | Ignored for .y4m. |
--pixel_format | -p | 420 | 422 | 444 | yes for .yuv | 420 covers the overwhelming majority of streamable content. |
--bitdepth | -b | 8 | 10 | 12 | 16 | yes for .yuv | 10 and 12 bit require a 10-/12-bit aware model (e.g. vmaf_b_v0.6.3 for banding sensitivity). |
If any of --width, --height, --pixel_format, --bitdepth is supplied the input is treated as raw YUV and all four become mandatory.
Models¶
The --model / -m flag takes a colon-delimited key/value string:
--model path=<file> # load a .json model from disk
--model version=<builtin> # load a built-in model by name
--model path=...:name=<str> # rename the metric in the output log
--model version=...:disable_clip # disable score clipping to [0, 100]
--model version=...:enable_transform # apply transform
Built-in model versions (compiled into libvmaf via -Dbuilt_in_models=true, default true):
| Version | Purpose |
|---|---|
vmaf_v0.6.1 | Default. 1080p training set, classic release. |
vmaf_v0.6.1neg | Negative-gain (NEG) — non-enhancing; recommended for encoder A/B where one encoder may artificially sharpen. |
vmaf_b_v0.6.3 | Banding-aware variant (used with CAMBI). |
vmaf_4k_v0.6.1 | 4K training set. |
vmaf_4k_v0.6.1neg | 4K + NEG. |
Float-precision variants (vmaf_float_v0.6.1, vmaf_float_v0.6.1neg, vmaf_float_b_v0.6.3, vmaf_float_4k_v0.6.1) are also resolvable but are legacy — prefer the integer versions for performance, the float versions only for bit-exact comparison with older reports.
--model can be passed multiple times to run several models in one pass; each model must have a unique name= (or the CLI errors out). Example running VMAF + VMAF-NEG side-by-side:
vmaf -r ref.y4m -d dist.y4m \
--model version=vmaf_v0.6.1:name=vmaf \
--model version=vmaf_v0.6.1neg:name=vmaf_neg \
--output scores.json --json
Additional features¶
The --feature flag enables extra metrics (beyond whatever the model already consumes). Syntax is the same colon-delimited key/value form as --model:
--feature psnr
--feature psnr=enable_chroma=true:enable_apsnr=true
--feature float_ssim=enable_db=true:clip_db=true
--feature cambi
--feature ciede
--feature psnr_hvs
--feature brisque
--feature brisque=model_path=/path/to/brisque_live.model
The brisque no-reference metric ships its trained model embedded in the binary, so it needs no extra arguments; model_path overrides it with an on-disk libsvm model (and is required only for builds with built_in_models disabled). See ../metrics/brisque.md.
See ../metrics/features.md for the full list of feature identifiers and per-feature options.
Output¶
| Flag | Default | Notes |
|---|---|---|
--output / -o <path> | stdout line + no file | Writes the per-frame + pooled log to <path>. |
--xml | default | XML report (upstream-compatible). |
--json | JSON report. | |
--csv | One row per frame. | |
--sub | SubRip subtitle format — useful for overlaying scores during playback. |
Stderr always carries a short progress line plus the final pooled-mean VMAF score, regardless of --output.
Score precision (fork-added)¶
--precision N # printf "%.<N>g", N in 1..17
--precision max|full # printf "%.17g" — IEEE-754 round-trip lossless (opt-in)
--precision legacy # printf "%.6f" — synonym for the default
The fork's default is %.6f (see ADR-0119, which supersedes ADR-0006), matching upstream Netflix output byte-for-byte so the CPU golden gate passes without explicit flags. Pass --precision=max whenever you need IEEE-754 round-trip lossless output (cross-backend numeric diff, archival reports, any consumer that re-parses scores into doubles and compares them). Affects XML, JSON, CSV, SUB, and stderr consistently.
See precision.md for the full table of when to pick each mode.
Backend selection¶
Build-time: each backend is opt-in via a meson flag. At runtime, backend selection is per-invocation through flags on vmaf — there is no environment variable that overrides it.
| Flag | Default | Effect |
|---|---|---|
--no_cuda | off | Forbid CUDA dispatch even if the CUDA backend is built in. |
--no_sycl | off | Forbid SYCL dispatch even if the SYCL backend is built in. |
--sycl_device <N> | auto (first GPU) | Pick SYCL device by ordinal from the oneAPI device list. |
--no_hip | off | Forbid HIP/ROCm dispatch even if the HIP backend is built in. |
--hip_device <N> | disabled (opt-in) | Pick HIP/ROCm device by ordinal. Pass 0 for the first AMD GPU. Without this flag the HIP backend is never used, even when the binary was built with -Denable_hip=true. See ../backends/hip/overview.md. |
--no_metal | off | Forbid Metal dispatch even if the Metal backend is built in (macOS only). |
--metal_device <N> | disabled (opt-in) | Pick Metal GPU by ordinal (macOS only). Pass 0 for the first Metal device (typically the integrated Apple GPU on Apple Silicon). Without this flag the Metal backend is never used, even on macOS builds. See ../backends/metal/index.md. |
--backend <name> | auto | Exclusive backend selector — auto (default; whichever backends are built compete by registry order), cpu, cuda, sycl, hip, metal. Setting a specific backend disables the others via the matching --no_X flags BEFORE dispatch and pins the device index for the chosen backend (gpumask=0 for CUDA, sycl_device=0 for SYCL, hip_device=0 for HIP, metal_device=0 for Metal). (The vulkan token and --no_vulkan / --vulkan_device flags were removed in ADR-0726.) |
--cpumask <bitmask> (-c) | all ISAs enabled | Mask out specific CPU ISAs (e.g. force scalar, disable AVX-512). Values are fork-internal — see core/src/cpu.h. |
--gpumask <bitmask> | all GPU ops enabled | Mask out specific GPU ops. |
--threads <N> | host nproc | CPU-side worker thread count. |
Vulkan backend removed (ADR-0726): The
--no_vulkan,--vulkan_device, and--backend vulkanflags no longer exist. Passing them produces an unrecognised-option error. See ../backends/vulkan/overview.md for historical context.
If neither backend is built in, these flags are silently inert. See ../backends/index.md for the runtime-dispatch rules and which features have GPU/SIMD twins.
Future work —
--gpu-calibrated(proposed, ADR-0234). A future flag will opt into a per-arch ULP calibration head that maps raw GPU scores to their CPU-equivalent values, closing the ~1e-4 cross-backend divergence currently withinplaces=4tolerance. Not shipped in this release — the flag does not exist yet; the calibration model is not trained yet. See the ADR for measurement gates that have to clear first.
Frame range¶
--frame_cnt <N> # stop after N frames (both streams)
--frame_skip_ref <N> # skip the first N frames of the reference
--frame_skip_dist <N> # skip the first N frames of the distorted
--subsample <N> # compute scores only every Nth frame (default 1 = all frames)
--subsample trades precision for speed — pooled scores are still computed over the sampled subset, so keep it at 1 for final reports.
Preset bundles¶
These expand to a canonical model + feature list for AOM and Netflix common-test-conditions reports. For example, --aom_ctc v7.0 is equivalent to:
--model version=vmaf_v0.6.1:name=vmaf
--model version=vmaf_v0.6.1neg:name=vmaf_neg
--feature psnr=reduced_hbd_peak=true:enable_apsnr=true:min_sse=0.5
--feature ciede
--feature float_ssim=scale=1:enable_db=true:clip_db=true
--feature float_ms_ssim=enable_db=true:clip_db=true
--feature psnr_hvs
--feature cambi
# plus common_bitdepth=on (forces reference + distorted to the same bitdepth)
--aom_ctc proposed is deprecated (errors out with an explanation).
Tiny-AI flags (fork-added)¶
--tiny-model <path> # load a .onnx tiny model alongside classic models
--tiny-device auto|cpu|cuda|openvino|coreml|coreml-ane|coreml-gpu|coreml-cpu|openvino-npu|openvino-cpu|openvino-gpu|rocm
# ORT execution provider (default: auto)
--dnn-ep auto|cpu|cuda|openvino|coreml|coreml-ane|coreml-gpu|coreml-cpu|rocm
# alias for --tiny-device; uses ORT terminology
# openvino-npu pins device_type=NPU (Intel AI-PC);
# openvino-cpu / openvino-gpu pin the OpenVINO
# CPU / GPU plugin with no fallback. See
# docs/ai/inference.md for the full matrix.
--tiny-threads <N> # CPU EP intra-op threads (0 = ORT default)
--tiny-fp16 # request fp16 I/O where the EP supports it
--no-reference # NR mode; requires a no-reference tiny model
--dnn-ep and --tiny-device are equivalent — they select the ONNX Runtime execution provider and write to the same internal setting. Use whichever name is more natural for your script; --dnn-ep follows the ORT "execution provider" terminology, while --tiny-device predates the alias.
Underscore aliases (--tiny_model, --tiny_device, --tiny_threads, --tiny_fp16, --no_reference, --dnn_ep) are accepted for scripting symmetry with the underscore flags upstream uses.
--no-reference puts the CLI into no-reference (NR) mode (ADR-0520):
--reference/-ris no longer required. The CLI opens the distorted source twice (twovideo_inputhandles backed by the same file) and the rank-4 tiny-model dispatch reads picture bytes from the slot that would have held the reference, so the model sees the distorted frame.--tiny-modelis now mandatory — no classic NR scorer exists in the fork. Omitting it returns the diagnostic--no-reference requires --tiny-model; no classic NR scorer exists.- The built-in
vmaf_v0.6.1SVM is auto-suppressed (NR mode forces--no_prediction); all classic SVM scorers consume FR feature columns (vif_*,adm2,motion2) that cannot be computed without a reference. To pin a tiny model on top of the SVM you need a reference. - The tiny model must accept a rank-4 single-luma input (
[1, 1, H, W]with fully-resolved spatial dims matching your distorted source). Rank-2 feature-vector tiny models (ADR-0518) load but always score0.0in NR mode, because their input features are derived from the reference. - The JSON / XML / CSV report contains only the tiny-AI feature column the model wrote through; no
pooled_metricsblock exists when--no_predictionis active.
See ../ai/inference.md for the full tiny-AI CLI walkthrough and the per-model registry (model/tiny/registry.json, sha256 pins, known limitations).
Codec-context flags (fork-added)¶
--tiny-codec <name> # encoder identity for codec-conditioned tiny models
# (libx264, libx265, libsvtav1, libvpx-vp9, h264_nvenc, ...)
--tiny-preset <name> # encoder preset string (medium, slow, p4, 5, ...)
--tiny-crf <0..63> # CRF / QP integer; values above 63 clamp at 63
--tiny-resize <mode> # bilinear | nearest | bicubic | disabled
These flags drive vmaf_dnn_set_codec_context() and vmaf_dnn_set_resize_mode() on the tiny model — see api/dnn.md.
Codec block. Codec-conditioned tiny models (e.g. the v2 ladder regressor) accept a small categorical block alongside the per-frame features: encoder identity, preset ordinal, and CRF / QP. The CLI sets this block once at model-load time. Unknown encoder names fall back to the "unknown" bucket with a stderr diagnostic; missing --tiny-codec on a model that requires codec context is permitted and routes through the "unknown" bucket silently. Set --tiny-codec, --tiny-preset, or --tiny-crf to any non-default value to enable the path. See ADR-0522 for the categorical encoding rationale.
Resize mode (ADR-0550). Required when the source frame size (--width / --height) differs from the tiny model's declared input shape:
--tiny-resize | Filter | Score-stable? |
|---|---|---|
disabled | None — size mismatch fails with -ERANGE (the default) | Strict |
bilinear | OpenCV INTER_LINEAR / torchvision BILINEAR | Yes — convention used by every shipped NR / image-input model |
nearest | OpenCV INTER_NEAREST | Yes — deterministic floor of source coord |
bicubic | Separable Catmull-Rom (a = -0.5); torchvision BICUBIC | Yes — exporter parity |
The three filter modes produce scores that differ by approximately 2% on the same input — treat filter choice as a model hyperparameter and pin it alongside the model checkpoint. A typo in the resize keyword surfaces at parse time (--tiny-resize must be one of: bilinear, nearest, bicubic, disabled) rather than after model load. Underscore aliases (--tiny_codec, --tiny_preset, --tiny_crf, --tiny_resize) are accepted for scripting symmetry.
Sigstore bundle verification (fork-added)¶
--tiny-model-verify is a boolean flag (no argument). It enables cosign verify-blob verification of the Sigstore bundle attached to the tiny-AI ONNX model before the model is loaded into ORT. Both the model path and its bundle path are inferred from --tiny-model: the bundle is expected at <model-path>.sigstore alongside the model file. Verification is performed in-process by shelling out to the cosign binary on the host's PATH; on success the loader proceeds normally, on failure the process exits non-zero with a diagnostic to stderr.
When to use it: production inference pipelines that need supply-chain verification of model integrity — e.g. a release runner that pulls a fork-signed .onnx from an artifact store and refuses to score with an unsigned or tampered model. For local development against an unsigned checkpoint, omit the flag.
Failure modes (all exit non-zero before any inference runs):
cosignbinary not onPATH.- Bundle path missing, unreadable, or not a valid Sigstore bundle.
cosign verify-blobreports an invalid signature, mismatched digest, or rejected certificate identity.
See ADR-0211 for the model-registry schema and the Sigstore-bundle integration that this flag consumes, and ../ai/inference.md for the end-to-end signed-model workflow.
Logging and misc¶
| Flag | Short | Effect |
|---|---|---|
--help | Print the flag reference to stdout and exit 0. | |
--quiet | -q | Disable the FPS meter when run in a TTY. |
--no_prediction | -n | Skip final model prediction; extract features only. Useful for feeding raw features into a custom pool. |
--version | -v | Print libvmaf version + git SHA and exit. |
Exit codes¶
| Code | Meaning |
|---|---|
| 0 | Success, or --help / --version invocation. |
| 1 | Any parse / I/O / runtime error. vmaf writes a diagnostic to stderr before exiting. |
| 100 | Explicit --backend <name> requested but the backend failed to initialise (ADR-0543). |
| 101 | No frames were decoded (empty or too-short input, or a --frame_skip_* value past end-of-stream). vmaf writes no frames decoded ... to stderr. |
A failed output-file write (bad path, full disk, permission denied) also exits non-zero: vmaf writes problem writing output to <path> (err=<n>) to stderr, where <n> is the negative VMAF_ERR_* code, instead of exiting 0 over a stale or partial file.
Apart from the dedicated codes above, libvmaf does not surface granular error codes at the process boundary; the specific VMAF_ERR_* code from the C API is logged to stderr but collapsed to exit 1 from the CLI.
Worked example — reproducing the upstream golden pair¶
Download the canonical Netflix test pair from upstream:
curl -sSLO https://github.com/Netflix/vmaf_resource/raw/master/python/test/resource/yuv/src01_hrc00_576x324.yuv
curl -sSLO https://github.com/Netflix/vmaf_resource/raw/master/python/test/resource/yuv/src01_hrc01_576x324.yuv
Run VMAF plus PSNR:
./build/core/tools/vmaf \
--reference src01_hrc00_576x324.yuv \
--distorted src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--model version=vmaf_v0.6.1 \
--feature psnr \
--output scores.xml
Expected stderr (CPU path):
Expected scores.xml head:
<VMAF version="3.x.y-lusoris.N">
<params qualityWidth="576" qualityHeight="324" />
<fyi fps="41.98" />
<frames>
<frame frameNum="0" integer_adm2="0.96208412..." ... psnr_y="34.76077932..." vmaf="83.8562851..." />
...
</frames>
<pooled_metrics>
<metric name="vmaf" min="71.17655..." max="87.18142..." mean="76.66890501..." harmonic_mean="76.51000634..." />
...
</pooled_metrics>
</VMAF>
Pooled-mean VMAF for this pair is 76.668905…. This is one of the three Netflix CPU goldens preserved verbatim as a required CI gate — see ADR-0024 and python/test/quality_runner_test.py.
Flag interactions and pitfalls¶
.yuvwithout geometry. Passing--reference foo.yuvwithout--width/--height/--pixel_format/--bitdeptherrors out..y4mcarries geometry in the header;.yuvdoes not.- Duplicate model names. Each
--modelmust have a uniquename=. If the same built-in version is loaded twice, setname=explicitly on at least one. --no_predictionwith--model.--no_predictionskips model prediction but does not skip loading — the model is still used to select which features to extract. Omit--modelentirely plus pass--no_predictionto extract only the features listed via--feature.- Default
%.6ftruncation. The default (and--precision legacy) truncates differences ≤ 1e-6 that would be distinguishable under--precision=max. Usemaxwhenever you need to compare scores numerically (cross-backend diff, archival reports). The default mode exists for byte-for-byte agreement with pre-fork Netflix output, which the CPU golden gate depends on. --tiny-modelvs--model. These compose — tiny-AI models are additional scores layered on top of the classic SVM/XGBoost prediction, not a replacement for it. Use--no_predictionif you want tiny scores alone. See ADR-0023.--no_cuda+--no_sycltogether. Forces CPU-only even on a build with both GPU backends compiled in. Useful for cross-backend diff sessions.
Related¶
- bench.md —
vmaf_benchmicro-benchmark harness. - vmaf-perShot.md — per-shot CRF predictor sidecar (T6-3b / ADR-0222).
- ffmpeg.md — using the VMAF filter inside
ffmpeg. - python.md — Python bindings for the CLI.
- precision.md — dedicated
--precisionflag walkthrough. - ../backends/index.md — runtime backend dispatch rules.
- ../metrics/features.md — per-feature identifiers and options.
- ../ai/inference.md — tiny-AI inference walkthrough.
- ADR-0119 (current precision default; supersedes ADR-0006), ADR-0023, ADR-0024, ADR-0100.