ADR-0408: FFmpeg libvmaf filter — CUDA backend selector¶
- Status: Accepted
- Date: 2026-05-09
- Deciders: lusoris
- Tags: ffmpeg-patches, cuda, integration
Context¶
FFmpeg's stock libvmaf filter consumes software AVFrames and runs the fork's CPU feature kernels. The fork has had a CUDA backend in libvmaf since the upstream Netflix import; it is reachable from the libvmaf CLI via --cuda and from the dedicated libvmaf_cuda filter (CUDA hwaccel frames in). What it has not been reachable from is the regular libvmaf filter on software input — even though the SYCL and Vulkan backends shipped per-context selectors for exactly that case in patches 0003 and 0004. A user with a CUDA-only build of libvmaf, software-decoded input, and no desire to wire up the CUDA hwaccel path on FFmpeg's side has no way to ask the libvmaf filter for CUDA acceleration.
The configure surface mirrors the asymmetry: --enable-libvmaf-sycl and --enable-libvmaf-vulkan exist as user-facing flags (added by the same 0003/0004 patches), but --enable-libvmaf-cuda does not — libvmaf_cuda is auto-detected at configure time alongside the dedicated libvmaf_cuda filter and is not in EXTERNAL_LIBRARY_LIST.
Decision¶
We will add a new patch 0010-libvmaf-wire-cuda-backend-selector.patch to the ffmpeg-patches/ series that:
- Exposes
--enable-libvmaf-cudaas anEXTERNAL_LIBRARY_LISTentry, matching--enable-libvmaf-sycl/--enable-libvmaf-vulkan. - Adds a
cudaboolean AVOption (default0) on thelibvmaffilter. Whencuda=1the filter inits aVmafCudaStateagainst the CUDA primary context on the default device (selected byCUDA_VISIBLE_DEVICESat process scope, matching the libvmaf CLI--cudaflag — the upstream C-API'sVmafCudaConfigurationhas no device-index field, unlike SYCL/Vulkan), imports the state into theVmafContext, preallocates aHOST_PINNEDVmafPicturepool on first frame, and dispenses pictures from the pool so the existing copy loop fills pinned-host memory. - Coexists with the upstream dedicated
libvmaf_cudafilter via theCONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTERguard. When both are configured the libvmaf-filter selector logs an error and refuses init rather than fighting the dedicated filter'scu_state.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Re-use the dedicated libvmaf_cuda filter only | Zero new surface in FFmpeg | Forces users onto FFmpeg's CUDA hwaccel decode path; misses software-input → CUDA-feature use case that SYCL/Vulkan already handle on the libvmaf filter | Asymmetric with SYCL/Vulkan; user request explicitly cites closing this gap |
Add an int cuda_device integer instead of a boolean | Mirrors the SYCL/Vulkan _device shape | VmafCudaConfiguration has no device_index field; the value would be ignored or require a fork-only API extension | Boolean honestly represents the upstream C-API; CUDA_VISIBLE_DEVICES is the documented selector at process scope |
Use DEVICE preallocation instead of HOST_PINNED | Avoids any host-side staging | The existing copy loop fills dst->data[i] with memcpy — DEVICE buffers would force a staging round-trip before kernel launch, defeating the purpose | HOST_PINNED matches Vulkan's HOST and lets the copy loop fill pinned memory directly; downstream CUDA kernels DMA without an extra hop |
Keep libvmaf_cuda as auto-detected (no EXTERNAL_LIBRARY_LIST entry) | Smaller configure diff | Asymmetric with SYCL/Vulkan; user explicitly asked for --enable-libvmaf-cuda | The user-facing flag is the deliverable |
Consequences¶
- Positive:
- Closes the FFmpeg integration gap; CUDA is now a peer of SYCL and Vulkan on the regular
libvmaffilter. --enable-libvmaf-cudaexists alongside its siblings; configure--helpadvertises all three symmetrically.- Fail-soft: when libvmaf was built without CUDA the configure probe silently disables
CONFIG_LIBVMAF_CUDA; the filter still builds,cuda=1then errors at filter-init time withAVERROR(ENOSYS). - Negative:
- Promoting
libvmaf_cudafrom blanket-autodetect toEXTERNAL_LIBRARY_LISTmeans users who previously got the in-filter CUDA selector "for free" with--enable-libvmafnow need--enable-libvmaf-cudaexplicitly. Acceptable for symmetry; the dedicatedlibvmaf_cudafilter is unaffected (its_filter_deps="libvmaf libvmaf_cuda ffnvcodec"chain still auto-resolves when all three are present). - The dedicated-filter coexistence guard (
error: cuda=1 set on the libvmaf filter, but FFmpeg was built with the dedicated libvmaf_cuda filter) reads as a footgun until the user discovers they should pick one or the other. - Neutral / follow-ups:
- When
VmafCudaConfigurationever grows adevice_indexfield upstream, swap the boolean for an integercuda_devicemirroring SYCL/Vulkan; track this as a follow-up rebase note. - Consider hoisting the picture-pool wiring into a shared helper once a third backend (HIP, Metal) lands a similar selector — at that point the
if (s->vulkan_state) … else if (s->cu_state) …chain incopy_picture_databecomes worth refactoring.
References¶
- Patch series:
ffmpeg-patches/series.txtandffmpeg-patches/README.md. - Sibling backends: ADR-0118 (series replay gate), ADR-0186 (Vulkan import with per-frame pool), ADR-0238 (lazy pool init pattern this ADR copies).
- libvmaf CUDA C-API:
core/include/libvmaf/libvmaf_cuda.h, ownership contract documented incore/src/cuda/AGENTS.md. - Source:
req(user task — close the FFmpeg integration gap so--enable-libvmaf-cudais exposed alongside the existing--enable-libvmaf-sycl/--enable-libvmaf-vulkan).