ADR-0121: Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)¶
- Status: Accepted
- Date: 2026-04-19
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ci, build, cuda, sycl, github-actions
Context¶
The fork's GPU backends (CUDA + SYCL) are exercised in CI only on Linux. The existing GPU matrix entries in libvmaf-build-matrix.yml all run on ubuntu-latest with gcc as the C/C++ host compiler:
Build — Ubuntu CUDA(gcc + nvcc)Build — Ubuntu SYCL(gcc host + icpx for SYCL .cpp)Build — Ubuntu SYCL + CUDABuild — Ubuntu CUDA Static,Build — Ubuntu SYCL Static
The single existing Windows job is Build — Windows MinGW64 (CPU) — MSYS2 / MinGW-w64 with -Denable_cuda=false -Denable_sycl=false. So no CI leg currently verifies that the CUDA host code or the SYCL .cpp sources even compile against the MSVC ABI.
This is a real coverage hole:
- Windows is a tier-1 platform for the libvmaf binary distribution (vmaf.exe artifact is uploaded by the MinGW job and consumed by downstream users).
- The CUDA backend's host code (
core/src/cuda/*.c) uses POSIX patterns (e.g.pthread_*aliases, file descriptors for dma-buf import paths) that need conditional#ifdef _WIN32guards. A Linux-only CI cannot catch a regression that breaks the MSVC build. - The SYCL backend's
vmaf_sycl_*C-API entry points (core/src/sycl/) similarly need to compile cleanly undericx-cl(the Windows DPC++ driver), which has subtly different__declspec(dllexport)and CRT-linkage requirements than the Linuxicpxdriver. - Downstream Windows users who try
meson setup -Denable_cuda=trueon Windows currently hit unguarded build breakage that we discover only via downstream issue reports.
The user explicitly scoped this PR as "Add both CUDA + SYCL Windows build-only legs" (see References) — the most ambitious of the three options I offered. "Build-only" because windows-latest GitHub runners have no GPU hardware, so executing GPU kernels is not possible; what we can do is verify compile + link.
Decision¶
Add a new top-level windows-gpu-build job in libvmaf-build-matrix.yml running on windows-latest, with two matrix entries:
| Display name | Backend | Toolchain | Build-only? |
|---|---|---|---|
Build — Windows MSVC + CUDA (build only) | CUDA | MSVC + nvcc 13.0.0 | yes |
Build — Windows MSVC + oneAPI SYCL (build only) | SYCL | MSVC + icx-cl 2025.3 | yes |
Both legs:
- Use
ilammy/msvc-dev-cmd@v1to set up the MSVC dev environment (community-standard action that runsvcvars64.batand exports the env vars). Required because nvcc shells out tocl.exeandicx-clis acl.exe-compatible driver. - Pin CUDA tooling to the exact same version as the Linux CUDA leg: CUDA 13.0.0 (via
Jimver/cuda-toolkit@v0.2.35). Identical pins keep the MSVC-vs-Linux comparison meaningful — if a build breaks here but not on Linux, it's an MSVC ABI issue, not a tooling-version delta. - Install oneAPI on Windows via the official
oneapi-src/oneapi-cioffline-installer pattern:curlthe Intel Base Toolkit Windows installer, extract the bootstrapper, run with--components=intel.oneapi.win.cpp-dpcpp-common --eula=acceptplusNEED_VS{2017,2019,2022}_INTEGRATION=0to skip the (slow) Visual Studio plug-in steps. Pinned to BaseKit 2025.3.0.372 to match the Linux SYCL leg'sintel-oneapi-compiler-dpcpp-cpp-2025.3. Two earlier candidates failed: (1)rscohn2/setup-oneapi@v0is Linux-only — every installer URL in itssrc/main.jsends in.sh; (2) Chocolatey'sintel-oneapi-basekitpackage is not on the community feed (verified 2026-04-19: lookup fast-failed in 3 s withpackage was not found with the source(s) listed). The Intel offline installer is what Intel themselves use in CI (oneapi-src/oneapi-ci/scripts/install_windows.bat), so it gives us the most stable, owner-blessed install path on Windows. - Inject
/experimental:c11atomicsintoCFLAGSandCXXFLAGSbeforemeson setupon Windows. libvmaf uses C11 atomics (stdatomic.h+__atomic_*); MSVC's<stdatomic.h>errors with"C atomic support is not enabled"unless the/experimental:c11atomicscompiler flag is set — the flag is opt-in until MSVC ships full C11/C17 atomics support. Setting it via env var is preferable to a meson native file because the flag is purely build-system-conditional, not source-side: gcc / clang / icpx / nvcc don't need it. - Pull two extra dependencies that the Linux GPU legs get for free from apt but Windows installers do not ship:
- CUDA
crtsub-package — added to theJimver/cuda-toolkitsub-packageslist. Providescrt/host_config.h(CUDA Runtime Library compile-time headers) whichcuda_runtime.hincludes unconditionally. Without it, the very first MSVC translation unit that touches the CUDA runtime errors withCannot open include file: 'crt/host_config.h'. (cuda_cccllooks like the intuitive name from the docs but is not a valid Windows sub-package — the installer rejects it with exit code0xE0E07F19. The Windows installer name is the barecrt.) - CUDA
nvvmsub-package — also added to thesub-packageslist. Shipsnvvm/bin/cicc.exe(CUDA's LLVM-based device compiler) andnvvm/libdevice/libdevice.*.bc(the device library nvcc links against). Without it, thenvccdriver binary installs fine but crashes at the first.cu → PTXstage withstderr: The system cannot find the path specified.— becausenvcc.profilepointsCICC_PATH = $(TOP)/nvvm/binand that directory doesn't exist. On Linux the aptcuda-nvcc-XYpackage pulls NVVM in transitively; on Windows the Jimver action requires naming it explicitly. Discovered by reproducing the CI failure locally in a Windows Server 2022 VM (CUDA 13.0.48, 2026-04-19) after eight rounds of CI-loop debugging failed to surface the root cause from log output alone. - Level Zero loader from source —
oneapi-src/level-zerocloned at tagv1.18.5(matches the Ubuntu 24.04 aptlevel-zero-devversion, keeping the parity invariant with the Linux SYCL leg) and built viacmake --build … --target installto a job-local prefix. Itsinclude/andlib/are appended toINCLUDEandLIBviaGITHUB_ENVso meson'scc.find_library('ze_loader', required: true)atcore/src/meson.build:492resolves. Windows oneAPI BaseKit ships the SYCL runtime but not the L0 loaderze_loader.lib; building from source is the Intel-documented path on Windows. - Make
svml/ircruntime-library lookup Linux-only incore/src/meson.build. Those explicitcc.find_librarycalls exist so a non-Intel host linker (gcc/g++) can resolve Intel runtime symbols emitted by icpx-compiled objects. On Windows the host C/C++ compiler is icx-cl itself — the same Intel toolchain that emits those symbols and auto-injects svml/irc at link time — and the Windows lib names differ (svml_dispmd.lib,libirc.lib) so the bare-name lookup would fail anyway. Guard the block withif host_machine.system() != 'windows'. - Win32
pthread.hcompat shim — libvmaf core uses the pthread API (mutex / cond / thread-create) across ~14 files, with#include <pthread.h>unconditional. MSVC and clang-cl ship nopthread.h(MinGW does, via winpthreads). Round-10 surfaced this asfatal error C1083: Cannot open include file: 'pthread.h'in six CUDA host TUs once the nvvm device compiler started running. Resolved by adding a header-only shim atcore/src/compat/win32/pthread.hthat maps the in-use pthread subset onto Win32 SRWLOCK + CONDITION_VARIABLE +_beginthreadex, mirroring the long-standingcompat/gcc/stdatomic.hpattern. Wired in via a newpthread_dependencydeclared incore/meson.buildand gated oncc.check_header('pthread.h')failing — so MinGW and POSIX paths are untouched. Shim coverspthread_t,pthread_mutex_*,pthread_cond_*,pthread_create,pthread_join,pthread_detach, plusPTHREAD_MUTEX_INITIALIZER/PTHREAD_COND_INITIALIZER— exactly the surface in use, no more. Floor is Vista+ for SRWLOCK / CONDITION_VARIABLE;windows-2022runners are well above that. Build-only legs do not exercise the shim's runtime semantics, but the GPU host TUs that link against libvmaf now compile cleanly under MSVC. - icpx-cl Windows host-arg handling — SYCL's
icpxdriver on Windows targetsx86_64-pc-windows-msvcand rejects-fPICoutright (unsupported option '-fPIC' for target 'x86_64-pc-windows-msvc'). PIC is the default for Windows DLLs anyway, so dropping the flag on Windows is the correct build-system fix, not a workaround. Resolved incore/src/meson.buildby introducingsycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []and threading it into bothsycl_common_argsandsycl_feature_argsin place of the hard-coded-fPICtoken. The same SYCL.cpptranslation units transitively includefeature_collector.h, which pulls inpthread.h— so the Win32 pthread shim path is also appended tosycl_inc_flagson Windows, mirroring thecuda_extra_includeshandling for the nvcc fatbincustom_target. icpxcustom_targetinvocations bypass meson's regulardependencies:plumbing the same way nvcc fatbins do, so the include-path threading must be explicit. - Skip the test step entirely.
windows-latesthas no GPU; running even CPU-only tests would consume runner minutes for no signal beyond what the Linux legs already provide.
The two job names are pinned to required status checks on master immediately after this PR's merge (21 → 23 contexts; counting the two Linux DNN legs from ADR-0120 if that PR landed first).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Status quo (Linux GPU only) | Zero CI cost. | MSVC build-portability regressions only surface from downstream user reports. | The whole motivation for this ADR is closing exactly that hole. |
| Add only the CUDA Windows leg | Half the CI minutes; CUDA is the more popular GPU backend. | SYCL on Windows is the less-tested backend → most likely to bit-rot → most valuable to gate. | Half coverage of the hole isn't satisfying; user scope was both. |
Add Windows GPU legs but as experimental: true (informational) | Avoids gating master on Windows GPU runner flakiness (Jimver/cuda-toolkit network occasionally times out). | Defeats the purpose: an informational green-vs-yellow distinction won't surface PR regressions to authors who don't notice yellow. | Pin them as required; flakiness is rare enough to absorb the occasional re-run. |
| Run actual tests via Windows GPU self-hosted runner | Real GPU-on-CI coverage, not just compile/link. | Requires self-hosted runner infrastructure with a discrete GPU; security/maintenance overhead disproportionate to the benefit; out of scope for this PR. | Out of scope. Could be a separate ADR if the fork ever justifies a self-hosted Windows GPU host. |
| Use cmake instead of meson on Windows | cmake has slightly better MSVC integration historically. | The fork has already standardised on meson everywhere; introducing cmake just for Windows GPU legs would create an unmaintained second build path. | Meson works fine on Windows + MSVC; just need correct env vars. |
Consequences¶
Positive:
- MSVC + CUDA build regressions and MSVC + oneAPI SYCL build regressions surface in CI on the PR that introduces them, not from downstream user reports months later.
- Windows tier-1 status is upheld for the GPU backends, not just for CPU.
- The
vmaf.exeartifact uploaded by the MinGW CPU job and the (eventual, future) MSVC GPUvmaf.exeartifact share the same test-portability story going forward.
Negative:
- Two additional
windows-latestmatrix runs per PR. Each Windows runner is ~2× the cost of a Linux runner in GHA minutes. Estimated cost: ~25 min wall-clock added per CI run (parallel across the matrix), ~50 GHA minutes added per run. Acceptable on the public fork's free tier. - Build-only ≠ runtime-tested. A regression that compiles cleanly but produces wrong output on Windows GPUs would still slip through. Mitigated by Linux GPU legs catching most behavioural regressions via the Ubuntu CUDA / SYCL legs.
- One new third-party action in the workflow:
ilammy/msvc-dev-cmd@v1. Widely used with a good security record, but additional supply-chain surface beyond the existingJimver/cuda-toolkit. The SYCL leg additionally pulls a signed installer fromregistrationcenter-download.intel.comover HTTPS — Intel-owned infrastructure, the same sourceoneapi-src/oneapi-ciuses. - The Intel offline-installer URL hard-codes the BaseKit version (
2025.3.0.372) and a per-release directory id. When the Linux SYCL leg bumps oneAPI, this URL must be updated in lockstep — drift defeats the parity invariant. Seedocs/rebase-notes.mdentry 0022 for the touch-list.
Neutral / follow-ups:
- Branch protection re-pinned atomically with this ADR's merge to add
Build — Windows MSVC + CUDA (build only)andBuild — Windows MSVC + oneAPI SYCL (build only)as required contexts. - A future ADR may add a Windows GPU runtime-test job once a self-hosted Windows GPU runner is justified.
- A future ADR may pin the third-party actions to commit SHAs (consistent with whatever SHA-pinning policy the rest of the repo adopts).
References¶
- ADR-0115 — matrix consolidation; this ADR adds a new job to the consolidated workflow.
- ADR-0116 — Title Case display names; the two new legs follow that convention.
- ADR-0120 — sister ADR (DNN-on matrix legs) landed immediately before this one.
- ADR-0037 — branch-protection policy that the post-merge re-pin updates.
- ilammy/msvc-dev-cmd@v1
- Jimver/cuda-toolkit@v0.2.35
- Intel
oneapi-src/oneapi-ci— official Windows install pattern - NVIDIA Windows CUDA sub-package list
- MSVC
/experimental:c11atomicsopt-in flag req(paraphrased): user picked "Add both CUDA + SYCL Windows build-only legs" via the post-cascade scope popup, after I offered it as the most-ambitious of three scope choices for Windows GPU coverage.- Per-surface doc impact: this ADR documents the workflow-file change and the branch-protection delta. The CUDA / SYCL backends themselves are unchanged; no
docs/backends/edit needed beyond the ADR.