Backends¶
libvmaf supports multiple compute backends for hardware-accelerated quality assessment. Backends are opt-in at build time via meson options and selected per-invocation at runtime through vmaf CLI flags or the C API.
| Backend | Meson option | Default-on? | Runtime opt-out | Status |
|---|---|---|---|---|
| CPU scalar | always on | yes | n/a | stable |
| x86 AVX2 | auto-detected | yes, when host supports | --cpumask | stable |
| x86 AVX-512 | -Denable_avx512=true | build-time opt-in | --cpumask | stable |
| ARM NEON | auto-detected on aarch64 | yes | --cpumask | stable — see arm/overview.md |
| CUDA | -Denable_cuda=true | no | --no_cuda | stable — see cuda/overview.md |
| SYCL / oneAPI | -Denable_sycl=true | no | --no_sycl / --sycl_device N | stable — see sycl/overview.md |
| HIP (AMD) | -Denable_hip=true | no | --no_hip / --hip_device N | --backend hip end-to-end working on AMD ROCm hosts (ADR-0519); 19/22 feature kernels real, 3 legacy stubs (adm, vif, motion). float_ansnr_hip was removed in commit 70ed8b3ce3 (PR #38). Dispatch currently routes through CPU twins (HIP scores match CPU bit-exactly). See hip/overview.md. |
| Metal (Apple Silicon) | -Denable_metal=auto/enabled | auto on macOS | n/a | Runtime + 17 wired, registered, parity-tested feature kernels live on Apple Silicon (incl. VIF, ADM, CIEDE, CAMBI, SSIMULACRA2) — see metal/index.md; the SpEED family is the one remaining Metal-twin gap |
Runtime selection¶
Backend selection is not controlled by environment variables in this fork. Backends are selected via CLI flags on vmaf (see ../usage/cli.md — "Backend selection") or programmatically through VmafConfiguration fields in the C API (gpu_enable, cuda_state, sycl_state).
VMAF_FORCE_BACKENDis not read bylibvmaf— it appeared in earlier drafts of this page as a planned selection mechanism, but the implemented surface is CLI-flag-based. If you are scripting alternate backends, set--no_cuda/--no_sycl/--sycl_device <N>on thevmafcommand line.
Dispatch precedence inside libvmaf (highest first):
- User-disabled backends are removed from the candidate list (
--no_cuda/--no_sycl/--cpumaskISA bits). - If a feature has a GPU kernel and a GPU backend survives the filter, the GPU path runs.
- Otherwise the best available CPU SIMD twin runs; scalar C is the universal fallback.
Explicit-backend semantics (--backend NAME)¶
The --backend exclusive selector accepts auto | cpu | cuda | sycl | hip | metal. (The vulkan token was accepted prior to ADR-0726; it now returns exit 1 with an unsupported-backend error.) Per ADR-0498 (2026-05-18):
--backend auto(default) keeps the soft-fallback chain — an init failure for the priority backend silently demotes to CPU with a stderr log line.--backend NAMEfor any explicit GPU backend turns init failure into a non-zero exit with a clear stderr error. CI gates that depend on backend-specific scoring no longer silently regress when, e.g., a GPU ICD fails to load in a container.- Per ADR-0543 (extends ADR-0498), the exit code for an explicit- backend init failure is a dedicated
100(VMAF_EXIT_BACKEND_INIT_FAILED) rather than the generic non-zero255(int -1truncated touint8_t). CI gates can match[[ $rc -eq 100 ]]to distinguish backend failures from other errors without parsing stderr. - When
--output X.jsonis also passed, the libvmaf CLI overwrites the output path with a single-line structured JSON descriptor carrying"error","backend_requested","errno","adr"(always"ADR-0498"), and"exit_code"keys — downstream wrappers can decode the failure structurally instead of falling back to stderr parsing (ADR-0543). - Per-feature symmetry (ADR-0543): a feature name ending in
_cuda/_sycl/_hip/_metalis a GPU-pinned variant. (The_vulkansuffix was retired with ADR-0726 — any remaining_vulkanfeature names are effectively dead code with no backing extractor.) If the matching backend isn't active in this run (not compiled in, not requested, or failed to init), the CLI hard-fails with the same exit100+ JSON descriptor instead of silently registering the CPU twin. - The JSON output gains a top-level
"backend_used": "NAME"key echoing what actually ran (cpu / cuda / sycl / hip / metal). Downstream consumers can confirm dispatch independently of stderr; mirrors the MCP-layer echo added by PR #1251.
Example:
# Explicit HIP; errors out hard if no AMD GPU is available.
vmaf --reference ref.yuv --distorted dist.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--model version=vmaf_v0.6.1 --backend hip \
--json --output /tmp/s.json
# stdout silent on success; /tmp/s.json carries:
# { ..., "backend_used": "hip" }
# On init failure: exit = 100 (ADR-0543), stderr:
# vmaf: --backend hip requested but init failed; refusing to
# silently fall back to CPU (ADR-0498)
# AND /tmp/s.json is overwritten with a structured error descriptor:
# {"error": "vmaf_hip_state_init failed",
# "backend_requested": "hip", "errno": -19,
# "adr": "ADR-0498", "exit_code": 100}
Note:
--backend vulkanwas removed in ADR-0726. Passing it returns a non-zero exit with an unsupported-backend error.
Not every feature has every twin — the coverage matrix is in ../metrics/features.md per feature and in each per-backend page below.
Guides¶
- x86 SIMD (AVX2 / AVX-512) — SIMD optimisation notes
- ARM NEON — aarch64 backend + build / runtime / per-feature coverage
- CUDA — NVIDIA GPU backend + build / invocation
- NVTX profiling — profiling CUDA kernels with NVIDIA Nsight
- SYCL / oneAPI — Intel GPU backend + build / invocation
- SYCL bundling — self-contained deployment without oneAPI runtime
- Vulkan — removed in ADR-0726; historical reference only
- HIP / AMD ROCm — opt-in backend; 19 registered feature extractors real (see hip/overview.md for the full table); 3 legacy API stubs (
adm_hip,vif_hip,motion_hip) are not registered and return-ENOSYS.float_ansnr_hipwas removed in commit 70ed8b3ce3 (PR #38). - Metal / Apple Silicon — auto-on-macOS; runtime + 17 wired, registered, parity-tested feature kernels live; the SpEED family is the one remaining Metal-twin gap
Cross-backend parity¶
Every backend pair is gated on every PR by the GPU-parity matrix gate (T6-8 / ADR-0214). The gate diffs per-frame metrics with a feature-specific absolute tolerance and emits one JSON / Markdown report per CI run. See ../development/cross-backend-gate.md for the tolerance table, how to read failure output, and how to add a new feature to the matrix.
Related¶
- ../usage/cli.md —
--no_cuda/--no_sycl/--sycl_device/--cpumask/--gpumaskflags. - ADR-0022 — tiny-AI runtime (separate from classic VMAF backend dispatch; tiny-AI uses ONNX Runtime execution providers).
- ADR-0027 — base-image / toolchain pins for GPU CI.