Skip to content

Backends

libvmaf supports multiple compute backends for hardware-accelerated quality assessment. Backends are opt-in at build time via meson options and selected per-invocation at runtime through vmaf CLI flags or the C API.

Backend Meson option Default-on? Runtime opt-out Status
CPU scalar always on yes n/a stable
x86 AVX2 auto-detected yes, when host supports --cpumask stable
x86 AVX-512 -Denable_avx512=true build-time opt-in --cpumask stable
ARM NEON auto-detected on aarch64 yes --cpumask stable — see arm/overview.md
CUDA -Denable_cuda=true no --no_cuda stable — see cuda/overview.md
SYCL / oneAPI -Denable_sycl=true no --no_sycl / --sycl_device N stable — see sycl/overview.md
HIP (AMD) -Denable_hip=true no --no_hip / --hip_device N --backend hip end-to-end working on AMD ROCm hosts (ADR-0519); 19/22 feature kernels real, 3 legacy stubs (adm, vif, motion). float_ansnr_hip was removed in commit 70ed8b3ce3 (PR #38). Dispatch currently routes through CPU twins (HIP scores match CPU bit-exactly). See hip/overview.md.
Metal (Apple Silicon) -Denable_metal=auto/enabled auto on macOS n/a Runtime + 17 wired, registered, parity-tested feature kernels live on Apple Silicon (incl. VIF, ADM, CIEDE, CAMBI, SSIMULACRA2) — see metal/index.md; the SpEED family is the one remaining Metal-twin gap

Runtime selection

Backend selection is not controlled by environment variables in this fork. Backends are selected via CLI flags on vmaf (see ../usage/cli.md — "Backend selection") or programmatically through VmafConfiguration fields in the C API (gpu_enable, cuda_state, sycl_state).

VMAF_FORCE_BACKEND is not read by libvmaf — it appeared in earlier drafts of this page as a planned selection mechanism, but the implemented surface is CLI-flag-based. If you are scripting alternate backends, set --no_cuda / --no_sycl / --sycl_device <N> on the vmaf command line.

Dispatch precedence inside libvmaf (highest first):

  1. User-disabled backends are removed from the candidate list (--no_cuda / --no_sycl / --cpumask ISA bits).
  2. If a feature has a GPU kernel and a GPU backend survives the filter, the GPU path runs.
  3. Otherwise the best available CPU SIMD twin runs; scalar C is the universal fallback.

Explicit-backend semantics (--backend NAME)

The --backend exclusive selector accepts auto | cpu | cuda | sycl | hip | metal. (The vulkan token was accepted prior to ADR-0726; it now returns exit 1 with an unsupported-backend error.) Per ADR-0498 (2026-05-18):

  • --backend auto (default) keeps the soft-fallback chain — an init failure for the priority backend silently demotes to CPU with a stderr log line.
  • --backend NAME for any explicit GPU backend turns init failure into a non-zero exit with a clear stderr error. CI gates that depend on backend-specific scoring no longer silently regress when, e.g., a GPU ICD fails to load in a container.
  • Per ADR-0543 (extends ADR-0498), the exit code for an explicit- backend init failure is a dedicated 100 (VMAF_EXIT_BACKEND_INIT_FAILED) rather than the generic non-zero 255 (int -1 truncated to uint8_t). CI gates can match [[ $rc -eq 100 ]] to distinguish backend failures from other errors without parsing stderr.
  • When --output X.json is also passed, the libvmaf CLI overwrites the output path with a single-line structured JSON descriptor carrying "error", "backend_requested", "errno", "adr" (always "ADR-0498"), and "exit_code" keys — downstream wrappers can decode the failure structurally instead of falling back to stderr parsing (ADR-0543).
  • Per-feature symmetry (ADR-0543): a feature name ending in _cuda / _sycl / _hip / _metal is a GPU-pinned variant. (The _vulkan suffix was retired with ADR-0726 — any remaining _vulkan feature names are effectively dead code with no backing extractor.) If the matching backend isn't active in this run (not compiled in, not requested, or failed to init), the CLI hard-fails with the same exit 100 + JSON descriptor instead of silently registering the CPU twin.
  • The JSON output gains a top-level "backend_used": "NAME" key echoing what actually ran (cpu / cuda / sycl / hip / metal). Downstream consumers can confirm dispatch independently of stderr; mirrors the MCP-layer echo added by PR #1251.

Example:

# Explicit HIP; errors out hard if no AMD GPU is available.
vmaf --reference ref.yuv --distorted dist.yuv \
     --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
     --model version=vmaf_v0.6.1 --backend hip \
     --json --output /tmp/s.json
# stdout silent on success; /tmp/s.json carries:
#   { ..., "backend_used": "hip" }
# On init failure: exit = 100 (ADR-0543), stderr:
#   vmaf: --backend hip requested but init failed; refusing to
#   silently fall back to CPU (ADR-0498)
# AND /tmp/s.json is overwritten with a structured error descriptor:
#   {"error": "vmaf_hip_state_init failed",
#    "backend_requested": "hip", "errno": -19,
#    "adr": "ADR-0498", "exit_code": 100}

Note: --backend vulkan was removed in ADR-0726. Passing it returns a non-zero exit with an unsupported-backend error.

Not every feature has every twin — the coverage matrix is in ../metrics/features.md per feature and in each per-backend page below.

Guides

  • x86 SIMD (AVX2 / AVX-512) — SIMD optimisation notes
  • ARM NEON — aarch64 backend + build / runtime / per-feature coverage
  • CUDA — NVIDIA GPU backend + build / invocation
  • NVTX profiling — profiling CUDA kernels with NVIDIA Nsight
  • SYCL / oneAPI — Intel GPU backend + build / invocation
  • SYCL bundling — self-contained deployment without oneAPI runtime
  • Vulkanremoved in ADR-0726; historical reference only
  • HIP / AMD ROCm — opt-in backend; 19 registered feature extractors real (see hip/overview.md for the full table); 3 legacy API stubs (adm_hip, vif_hip, motion_hip) are not registered and return -ENOSYS. float_ansnr_hip was removed in commit 70ed8b3ce3 (PR #38).
  • Metal / Apple Silicon — auto-on-macOS; runtime + 17 wired, registered, parity-tested feature kernels live; the SpEED family is the one remaining Metal-twin gap

Cross-backend parity

Every backend pair is gated on every PR by the GPU-parity matrix gate (T6-8 / ADR-0214). The gate diffs per-frame metrics with a feature-specific absolute tolerance and emits one JSON / Markdown report per CI run. See ../development/cross-backend-gate.md for the tolerance table, how to read failure output, and how to add a new feature to the matrix.

  • ../usage/cli.md--no_cuda / --no_sycl / --sycl_device / --cpumask / --gpumask flags.
  • ADR-0022 — tiny-AI runtime (separate from classic VMAF backend dispatch; tiny-AI uses ONNX Runtime execution providers).
  • ADR-0027 — base-image / toolchain pins for GPU CI.