ADR-0150: Port Netflix #1472 — CUDA feature extraction on Windows (MSYS2/MinGW)¶
- Status: Accepted
- Date: 2026-04-24
- Deciders: Lusoris, Claude (Anthropic)
- Tags: upstream-port, cuda, windows, mingw, build
Context¶
Pre-port, the fork's CUDA backend built cleanly on Linux + nvcc + gcc but failed on Windows + nvcc + MinGW-GCC. Two categories of friction:
- Source portability. On Windows, nvcc requires MSVC's
cl.exeas its host compiler for preprocessing.cufiles — even when the rest of libvmaf is built with MinGW-GCC. The.cuTUs transitively pull<pthread.h>(POSIX-only, absent in MSVC's CRT),<ffnvcodec/dynlink_*.h>(installed in MinGW paths thatcl.exedoesn't search), and C99 designated initializers (not accepted by nvcc in C++ mode with MSVC). Each of these is silently fine on Linux with GCC-as-host. - Build-system plumbing.
meson'sadd_languages('cuda')helper requires MSVC to be the default C compiler on Windows, which is incompatible with the rest of the fork (which needs MinGW-GCC for the host-side libvmaf TUs). Adding MSVC toPATHfor nvcc's sake would cause meson to pick it as the default C compiler and break the CPU build.
Netflix upstream PR #1472 — "cuda: enable CUDA feature extraction on Windows (MSYS2/MinGW)" (birkdev, Mar 2026, OPEN) — addresses both categories. PR is two commits:
15745cdf— source-portability guards in CUDA headers +.cufiles.b7b65e64— meson build plumbing:vswhere-basedcl.exediscovery withoutPATHpollution, Windows SDK + MSVC include path injection via-Iflags to nvcc, CUDA version detection vianvcc --versioninstead ofmeson.get_compiler('cuda').
Decision¶
Port both upstream commits, applied in order, with three fork-specific conflict resolutions:
Conflict 1: integer_adm.h designated initializers¶
Upstream's patch wraps the dwt_7_9_YCbCr_threshold table in #ifndef __CUDACC__, hiding it from .cu TUs entirely.
The fork (commit pre-dates upstream #1472) already solves the same problem differently — by rewriting the initializer in positional form ({0.495, 0.466, 0.401, {…}} instead of {.a = 0.495, .k = 0.466, .f0 = 0.401, …}), which is C++-portable across MSVC/GCC/clang.
Fork keeps positional form. It's strictly more useful — the table stays visible to any future .cu TU that might want to use it. The #ifndef __CUDACC__ approach is the minimal change if a .cu consumer never appears; the positional form works for all cases. Comment updated to note the divergence.
Conflict 2: meson.build gencode coverage + nvcc plumbing¶
Upstream's patch rewrites the if get_option('enable_nvcc') block to add MSVC discovery, Windows SDK include path injection, and CUDA version detection via nvcc --version. It also changes cuda_compiler.version() to cuda_version (a new local string).
The fork already has its own extension of the same block: an explicit gencode list that cubins for sm_75 / sm_80 / sm_86 / sm_89 plus a compute_80 PTX fallback, with a long comment explaining the Netflix coverage hole ADR-0122 documented.
Merged shape: upstream's MSVC/Windows detection + CUDA version detection runs first (host-independent cubin generation needs cuda_version regardless of platform), then the fork's gencode coverage comment + list follows. Both now reference cuda_version instead of the dropped cuda_compiler.version().
Conflict 3: cuda_static_lib dependencies¶
Upstream's patch drops the dependencies : [pthread_dependency] line from cuda_static_lib, on the assumption that commit 1 removed the only pthread.h pull (cuda/common.h).
Fork keeps the dependency: core/src/cuda/ring_buffer.c is part of cuda_static_lib and directly #includes <pthread.h>. Removing the explicit pthread_dependency would relink-fail on hosts where pthread is not transitively available via another dep in the TU graph. Comment updated to explain.
Drive-by lint cleanups (ADR-0141 touched-file rule)¶
The upstream patch touches cuda/common.h and picture.h, which carried the reserved-identifier header guards __VMAF_SRC_CUDA_COMMON_H__ and __VMAF_SRC_PICTURE_H__ (leading-double-underscore = reserved at file scope). clang-tidy flags both as bugprone-reserved-identifier / cert-dcl37-c / cert-dcl51-cpp. Renamed to VMAF_SRC_CUDA_COMMON_INCLUDED / VMAF_SRC_PICTURE_INCLUDED — same pattern ADR-0148 used for the IQA header guards.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Cherry-pick upstream PR verbatim via git cherry-pick | One-shot; traceable to upstream | The meson.build and integer_adm.h conflicts need resolution; the cuda_static_lib pthread drop is actively wrong for the fork | Rejected — 3-way git am --3way with explicit conflict handling is cleaner |
Use #ifndef __CUDACC__ for the ADM table (upstream shape) | Matches upstream byte-for-byte; smallest delta on future rebase | Makes the table unavailable to any hypothetical .cu consumer | Rejected — fork's positional form is strictly more useful; .cu TUs that want the table don't have to special-case |
| Drop pthread_dependency from cuda_static_lib (upstream shape) | Matches upstream byte-for-byte | ring_buffer.c uses pthread directly; link can fail depending on transitive deps | Rejected — explicit pthread_dependency is the correct build description |
| Skip Windows CUDA support entirely | Zero port cost | Blocks Windows + CUDA users; widens the fork's platform-coverage gap vs upstream | Rejected — the port is small and mechanical once the conflicts are handled |
Consequences¶
- Positive:
- Windows + MSYS2 + MinGW + nvcc + MSVC Build Tools now builds libvmaf with CUDA enabled.
vswhere-basedcl.exedetection avoids pollutingPATH(which would break the MinGW-GCC CPU build). - CUDA version detection via
nvcc --versionremoves the Windows-hostileadd_languages('cuda')meson helper. - Linux + GCC build is a strict no-op (all new code paths are guarded by
host_machine.system() == 'windows'). - Two header-guard reserved-identifier warnings cleared as a drive-by.
- Negative:
- Upstream PR #1472 is still OPEN. Future sync will need to re-diff the three conflict-resolved hunks (see rebase-notes entry 0043). Keep the fork's version on rebase for
integer_adm.hpositional initializers andcuda_static_libpthread_dependency; adopt upstream's shape only if it changes shape meaningfully. - The fork is now slightly ahead of upstream on ADR-0122 gencode coverage AND Windows support at the same time; a future rebase that touches either one will stretch across both features. Noted in rebase-notes 0043.
- Neutral / follow-ups:
- CI does not yet exercise the Windows CUDA path — no runner with MSYS2 + MSVC Build Tools + CUDA toolkit is currently enrolled. The port is provably complete on Linux (CPU build DA build both pass all tests); Windows validation is operator-driven. Tracked as T7-3 in
.workingdir2/OPEN.md(self-hosted GPU runner enrollment). - nv-codec-headers on MinGW needs to be built from
876af32or later — then13.0.19.0release tag is missingcuMemFreeHost,cuStreamCreateWithPriority,cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing, not this PR's scope; documented in upstream PR body.
Verification¶
meson setup libvmaf core/build-cuda-port -Denable_cuda=true -Denable_sycl=false -Denable_nvcc=true→ OK.ninja -C core/build-cuda-port→ 6.fatbinfiles produced (adm_cm,adm_csf,adm_csf_den,adm_dwt2,filter1d,motion_score),tools/vmafCLI linked.meson test -C core/build-cuda-port→ 35/35 pass.meson test -C build(CPU-only) → 32/32 pass.clang-tidy -p build core/src/cuda/common.h core/src/picture.h core/src/feature/integer_adm.h→ zero warnings.- Windows build: requires a Windows + MSYS2 + MSVC BuildTools + CUDA runner; not validated in this PR.
References¶
- Upstream PR: Netflix/vmaf#1472, commits
15745cdf+b7b65e64(birkdev, 2026-03-16, OPEN). - Backlog:
.workingdir2/BACKLOG.mdT4-2. - ADR-0122 — fork's existing gencode coverage extension (Context for conflict 2).
- ADR-0141 — touched-file lint-clean rule (drove the header-guard rename).
- ADR-0148 — precedent for header-guard
_INCLUDEDrenames. - User direction 2026-04-24 popup: "Port Netflix#1472 CUDA-on-Windows MinGW".