ADR-0103: vmaf_sycl_import_d3d11_surface ships as a staging-texture H2D path, not zero-copy¶
- Status: Accepted
- Date: 2026-04-18
- Deciders: Lusoris, Claude (Anthropic)
- Tags: sycl, windows, api
Context¶
The public header core/include/libvmaf/libvmaf_sycl.h declared vmaf_sycl_import_d3d11_surface() under #ifdef _WIN32 alongside its doc block ("Creates a staging texture, copies the decoded surface, maps it for CPU read, and uploads via H2D memcpy"), but no implementation existed anywhere in the tree. Issue #27 surfaced this as a link-time failure: any Windows consumer compiling against libvmaf-sycl would see undefined reference to vmaf_sycl_import_d3d11_surface at link time, because the symbol's declaration was visible but no TU exported it.
The options were:
- Remove the declaration — small diff, closes the issue, but permanently locks out Windows D3D11 callers (MediaFoundation, DXVA2, DirectX 11 VideoProcessor) from the SYCL backend.
- Implement the staging-texture H2D path — matches the existing doc block, larger diff, requires a Windows validation path.
- Keep the declaration, implement as
-ENOSYSstub — no useful behaviour gained.
The user chose option 2.
Decision¶
Surface¶
Implemented in a new TU core/src/sycl/d3d11_import.cpp. The entire file body is wrapped in #ifdef _WIN32 so the TU is a compile-clean no-op on Linux / macOS — it slots into sycl_sources alongside dmabuf_import.cpp without gating the Meson source list on host_machine.system().
Call flow¶
caller's ID3D11Texture2D* ──▶ CopyResource ──▶ staging tex
(D3D11_USAGE_DEFAULT) (D3D11_USAGE_STAGING,
D3D11_CPU_ACCESS_READ)
│
▼
Map()
│
▼
vmaf_sycl_upload_plane
(mapped.pData, RowPitch)
│
▼
SYCL shared buffer (USM)
│
▼
Unmap()
Release(staging)
Fast path for caller-provided staging textures¶
If the caller already hands us a staging texture (Usage == D3D11_USAGE_STAGING and CPUAccessFlags & D3D11_CPU_ACCESS_READ), the function skips the CreateTexture2D + CopyResource prelude and maps the caller's texture directly, honouring their subresource index. This matters for decoder pipelines that already stage on the encoder's side (e.g. DXVA2 output samples that arrive pre-staged).
COM bindings¶
The implementation uses #define COBJMACROS and the C-style COM macros (ID3D11Device_CreateTexture2D, ID3D11Texture2D_Release, ID3D11DeviceContext_Map, etc.). These are defined identically in both MSVC's d3d11.h and mingw-w64's port, so a single source works under Intel oneAPI DPC++ (which wraps MSVC on Windows) and under any future MinGW-based SYCL build. No #ifdef _MSC_VER split was necessary after verifying the Wine-derived header family exports all 25 required APIs (CreateTexture2D, GetDesc, CopyResource, GetImmediateContext, Map, Unmap, Release, plus the D3D11_MAPPED_SUBRESOURCE / D3D11_TEXTURE2D_DESC types).
Meson wiring¶
core/src/meson.buildappendsd3d11_import.cpptosycl_sources. On non-Windows the TU compiles to an empty.o(verified: 3312-byte icpx output on Linux).- When
host_machine.system() == 'windows', the build adds-ld3d11 -ldxgitosycl_surface_deps. On non-Windows nothing is linked.
Validation¶
- Automated: MinGW CI job
windows.ymlbuilds with-Denable_sycl=false(Intel oneAPI DPC++ isn't available on the stock GitHubwindows-latestrunner and paid runner minutes are off the table per the user's standing constraint). The MinGW job therefore does not exercise this TU at all — it's gated behindis_sycl_enabledinmeson.build. - Manual: The user boots a local Windows VM with Intel oneAPI DPC++ installed and runs the reproducer at
docs/development/windows-d3d11-import.md. That doc lists EINVAL-path smoke tests (no D3D11 device needed) plus a full reproducer skeleton using MediaFoundation's H.264 decoder.
Alternatives considered¶
| Alternative | Why rejected |
|---|---|
| Remove the declaration from the public header | Permanently locks Windows D3D11 callers out of the SYCL backend. The fork's MediaFoundation-ingestion path would need its own CPU-stage-and-upload helper, duplicating the work this function centralises. |
| Zero-copy via DXGI NT-handle sharing + SYCL D3D11 interop | Intel oneAPI DPC++ as of 2025.1 has sycl::ext::oneapi::experimental::external_memory for Vulkan/OpenGL interop, but no documented ID3D11Resource import path. Implementing shared-NT-handle → HANDLE → external_memory_descriptor ourselves would require replicating a chunk of the DirectX interop contract and hasn't been validated by Intel. Revisit when DPC++ ships first-class D3D11 interop. |
Zero-copy via IDXGIKeyedMutex shared surface | Would need the caller to set up a shared surface at decoder init time — breaks the drop-in contract of the existing vmaf_sycl_import_va_surface. Also requires the SYCL side to open the same shared handle, which hits the same DPC++-doesn't-document-D3D11-interop wall as above. |
MSVC-only implementation gated on #ifdef _MSC_VER | mingw-w64's d3d11.h exports the same COM bindings as MSVC's (verified: 25 API hits in the Wine-derived header family). The compiler split would have been premature pessimism. |
Direct ID3D11Texture2D::Map on the source tex (no staging) | Source textures from decoders are almost always D3D11_USAGE_DEFAULT — they can't be mapped. The staging round-trip is what the D3D11 programming model requires for CPU read. The fast-path for pre-staged textures above covers the one case where this isn't needed. |
Consequences¶
- Windows consumers compiling against libvmaf-sycl now get a working
vmaf_sycl_import_d3d11_surfaceinstead of a link error. - The implementation is not zero-copy: throughput is bounded by PCIe upstream (staging Map) + PCIe downstream (SYCL H2D). For 1080p8 input on a typical PCIe Gen3 x8 iGPU that's ≲ 2 ms per call; for 2160p10 it's ≲ 12 ms. Docs (sycl/overview.md) flag this clearly so callers don't mistake the surface for a zero-copy path.
- Runtime validation is manual, not automated. The validation-gap is deliberate per the user's "no paid GitHub Actions" constraint and documented in
docs/development/windows-d3d11-import.md. - Build system gains a pair of Windows-only linker deps (
d3d11,dxgi). Both are in the Windows SDK / mingw-w64 headers package by default — no new external dependency to pin. - Merge friendliness:
d3d11_import.cppis a new file, not a modification of any existing SYCL TU. Easy to revert if the zero-copy path becomes available later.
References¶
- Issue #27 — vmaf_sycl_import_d3d11_surface declared but not implemented
- Microsoft — ID3D11DeviceContext::Map method
- Microsoft — D3D11_USAGE enumeration
- Intel — oneAPI DPC++ Compiler, external_memory extension
- ADR-0040 — multi-input DNN session API (unrelated; cited for the "new TU lives next to sibling imports" pattern this ADR reuses).