ADR-0238: Vulkan VmafPicture preallocation surface (API parity with CUDA / SYCL)¶
- Status: Accepted
- Date: 2026-05-02
- Deciders: Lusoris
- Tags: vulkan, api, preallocation, fork-local, parity
Context¶
The CUDA backend ships vmaf_cuda_preallocate_pictures / vmaf_cuda_fetch_preallocated_picture (with HOST / HOST_PINNED / DEVICE methods); the SYCL backend ships vmaf_sycl_preallocate_pictures / vmaf_sycl_picture_fetch (HOST / DEVICE). Vulkan ships nothing — docs/api/gpu.md flagged this as a known limitation:
Per-feature picture preallocation API is not yet exposed — Vulkan kernels allocate device-side VkBuffers internally per feature. The CUDA/SYCL-style
VmafVulkanPicturePreallocationMethodsurface is a follow-up.
The gap blocks two real use cases:
- Driver-style integration: a caller that owns its decode loop wants to write source frames directly into the buffers libvmaf's compute kernels will read, removing the host → device staging copy. Without preallocation, every frame round-trips through the per-extractor internal buffers.
- Memory budgeting: in a long-running pipeline (FFmpeg per-shot tooling, encoder-side ROI scoring), pinning the picture pool depth up front lets operators reason about the resident set instead of tracking per-feature internal allocators.
Closing the parity gap is purely additive — the existing import-image zero-copy path (ADR-0186 / ADR-0251) remains the right answer for external-VkImage callers; preallocation serves the host-driven path.
Decision¶
We will ship VmafVulkanPicturePreallocationMethod (NONE / HOST / DEVICE) and the matching vmaf_vulkan_preallocate_pictures + vmaf_vulkan_picture_fetch entry points, modelled on the SYCL surface (the SYCL pool is the simpler reference; CUDA's HOST_PINNED option maps to a CUDA-only allocator and has no Vulkan analogue).
HOST allocates pictures via the regular vmaf_picture_alloc — useful when the score consumer is CPU-side. DEVICE backs each picture's luma plane with a host-visible Vulkan buffer (VMA AUTO_PREFER_HOST); the persistent mapped pointer is exposed as pic->data[0], so the caller writes once and the kernel descriptor sets read the same memory. A new VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICE tags these pictures so cross-backend extractors can refuse mixed backings.
Pool depth is fixed at the canonical frames-in-flight = 2 (matches SYCL's compile-time constant); the depth is not part of the public surface in this PR. ADR-0251's max_outstanding_frames knob is unrelated — that controls the import-image fence ring, not the preallocation pool. A separate follow-up can grow the configuration struct if a real workload needs depth > 2.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Mirror SYCL surface (chosen) | Shortest review path; SYCL pool is already battle-tested in the FFmpeg integration; same enum names + same pool-depth contract | Inherits SYCL's "luma-only" pool (chroma allocation deferred) | Picked: API parity is the goal; the chroma deferral is consistent across SYCL + Vulkan and tracked separately |
| Mirror CUDA surface (HOST / HOST_PINNED / DEVICE) | Three methods feels more complete; CUDA users see familiar names | HOST_PINNED has no Vulkan analogue (VMA AUTO_PREFER_HOST is the closest, but it's not "pinned" in the CUDA sense) | Rejected: would ship a method we'd then have to caveat as "treated like HOST" |
| Skip the public surface; expose a fork-internal helper | Smaller blast radius; less ABI commitment | Doesn't close the parity gap noted in docs/api/gpu.md; FFmpeg integrators still bounce off the missing surface | Rejected: the gap is explicit and visible |
Bake preallocation into vmaf_vulkan_state_init (auto-pool-on-import) | One fewer call site for callers | Forces every Vulkan caller to pay the pool cost; breaks the import-image zero-copy path which deliberately doesn't pre-allocate | Rejected: opt-in is the right ergonomics |
Add a pic_cnt parameter to the configuration struct in this PR | Configurable from day one | Premature; SYCL's hard-coded 2 has shipped without complaint; growing the struct later is additive | Rejected: ship the minimal surface, grow it if a workload demands depth > 2 |
Consequences¶
- Positive:
- Closes the CUDA / SYCL / Vulkan API parity gap for picture preallocation, removing the explicit "follow-up" caveat from
docs/api/gpu.md. - Lets FFmpeg-side host-driven integrations (the non-zero-copy path) write source frames directly into the buffers libvmaf compute kernels read, removing one host → device staging copy.
- The new
VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICEtag enables cross-backend extractors to refuse mixed-backing inputs in future work. - Pool tear-down lives in a single hook in
vmaf_close(), matching the SYCL pattern; no per-extractor cleanup needed. - Negative:
- One additional translation unit (
core/src/vulkan/picture_vulkan_pool.c, ~180 LOC). - Public ABI grows two entry points + one enum + one struct. Backwards compatibility is preserved (additive only).
- The pool currently allocates only the Y plane — same as SYCL. Chroma-aware extractors that want pre-allocated U/V planes (none of the current Vulkan kernels do) need a follow-up.
- Neutral / follow-ups:
- Pool depth tunable. The current 2 matches SYCL's compile-time constant; if a workload needs depth > 2, grow the
VmafVulkanPictureConfigurationstruct additively (mirroring ADR-0251 follow-up #3's pattern). - Chroma plane preallocation. Out of scope here; tracked as a separate item once a Vulkan kernel actually consumes pre-allocated chroma.
- External-handles parity. The pool currently uses the imported state's VkInstance / VkDevice via the fork-internal
vmaf_vulkan_state_context()accessor. External callers (vmaf_vulkan_state_init_external) work transparently; no additional plumbing required. - Cross-backend parity gate stays unchanged at
places=4— preallocation only changes where the buffer lives, not which bytes the kernel reads.
FFmpeg-side adoption (deferred, by design)¶
CLAUDE.md §12 r14 requires ffmpeg-patches/ updates when libvmaf adds an entry point that the patches consume. None of the in-tree patches (0002 cuda / 0003 sycl / 0004 vulkan-selector / 0006 libvmaf_vulkan) call any *_preallocate_pictures / *_picture_fetch entry point today — neither for CUDA nor for SYCL. Those filters go through the regular per-frame vmaf_picture_alloc host path. Switching them to consume the preallocation pool is a separate optimization with two distinct angles:
vf_libvmaf_vulkan(zero-copy import path, patch0006): this filter receives external VkImages viavmaf_vulkan_import_imageand never allocates VmafPictures itself. Preallocation is orthogonal — no change needed, ever.vf_libvmafhost path withvulkan_device=N: currently allocates host-backed pictures per frame. Switching tovmaf_vulkan_preallocate_pictures(DEVICE)would let FFmpeg's frame data land directly in the kernel-read buffer. Real perf win, but it's a separate PR with its own benchmark gate (mirrors the pattern of ADR-0251's measurement-gate flip). For symmetry with SYCL — which also ships preallocation but whose FFmpeg filter doesn't consume it yet — we deliberately keep this PR's scope at "API parity surface" rather than "FFmpeg integration optimization". When the FFmpeg adoption PR lands, it's additionally gated on demonstrating no host → device staging copy inperf recordtraces against an unchanged Netflix golden score.
ffmpeg-patches/series.txt is unchanged by this PR; the apply --check gate against the pinned n8.1 baseline passes unmodified.
References¶
- Source:
req2026-05-02 — popup answer "do 1 first then 4" (test_speed fix, then per-feature picture preallocation for Vulkan). docs/api/gpu.md— flagged the gap.core/include/libvmaf/libvmaf_cuda.h— CUDA reference surface (VmafCudaPicturePreallocationMethod).core/include/libvmaf/libvmaf_sycl.h— SYCL reference surface (chosen mirror).core/src/sycl/picture_sycl.cpp— SYCL pool implementation (the model forpicture_vulkan_pool.c).- ADR-0186 — the existing zero-copy import path; preallocation is a parallel surface, not a replacement.
- ADR-0251 —
max_outstanding_framesknob. Orthogonal to this ADR (controls the fence ring, not the picture pool). - Companion research digest:
docs/research/0045-vulkan-picture-preallocation.md.
Status update 2026-05-08: Accepted¶
Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).
Acceptance criteria verified in tree at HEAD 0a8b539e:
core/include/libvmaf/libvmaf_vulkan.hdeclaresVmafVulkanPicturePreallocationMethod(line 153),VmafVulkanPictureConfiguration::pic_prealloc_method(line 165),vmaf_vulkan_preallocate_pictures(line 180),vmaf_vulkan_picture_fetch(line 191).- Verification command:
grep -nE "VmafVulkanPicturePreallocationMethod|vmaf_vulkan_preallocate_pictures|vmaf_vulkan_picture_fetch" core/include/libvmaf/libvmaf_vulkan.h.