Research-0045: Vulkan picture preallocation — option-space digest¶
- Date: 2026-05-02
- Companion ADR: ADR-0238
Question¶
CUDA + SYCL ship a public picture-preallocation surface (*_preallocate_pictures
*_picture_fetch); Vulkan does not. What's the smallest surface that closes the parity gap without locking the design to a single host-allocator pattern?
Reference surfaces compared¶
| Trait | CUDA (libvmaf_cuda.h) | SYCL (libvmaf_sycl.h) | Vulkan (this digest) |
|---|---|---|---|
| Methods | NONE, HOST, HOST_PINNED, DEVICE | NONE, HOST, DEVICE | NONE, HOST, DEVICE |
| Pool depth | Caller-controlled (via vmaf_cuda_ring_buffer) | Compile-time pic_cnt = 2 | Compile-time pic_cnt = 2 (mirrors SYCL) |
| Backing | cudaMalloc / cudaMallocHost / pinned | sycl::malloc_device / sycl::malloc_host | VMA AUTO_PREFER_HOST VkBuffer / regular host |
| Plane coverage | All planes | Y plane only | Y plane only (mirrors SYCL) |
| Buffer-type tag | VMAF_PICTURE_BUFFER_TYPE_CUDA_* | VMAF_PICTURE_BUFFER_TYPE_SYCL_DEVICE | VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICE (new) |
| Lock model | None (single-threaded fetch) | std::mutex | pthread_mutex_t (no C++ in vulkan/) |
The SYCL surface is the cleaner reference. CUDA's HOST_PINNED is a CUDA-allocator-specific concept; VMA's AUTO_PREFER_HOST is the closest analogue but isn't "pinned" in the CUDA sense — exposing it under the same name would invite the wrong mental model.
Decisions implied by the option survey¶
- Three methods, not four —
NONE/HOST/DEVICE. NoHOST_PINNEDanalogue. - Compile-time pool depth = 2 — matches SYCL exactly. Growing to a configurable depth is additive (mirror ADR-0251 follow-up #3's pattern of growing
VmafVulkanConfigurationwith an optionalunsigned). - Y-plane only DEVICE backing — matches SYCL. Chroma kernel work in this fork still allocates its own buffers per-feature; no Vulkan extractor currently consumes preallocated chroma.
pthread_mutex_tinstead of C++ —core/src/vulkan/is pure C; the SYCL pool'sstd::mutexdoesn't translate. The round-robin counter is the only contended state.- Pool lives on
VmafContext, not onVmafVulkanState— matches both CUDA and SYCL. The state is the GPU resource handle; the pool is a per-context resource that's created viavmaf_vulkan_preallocate_picturesand torn down invmaf_close().
Lifetime / fail-paths surveyed¶
The SYCL pool's allocation unwind on partial failure is the right model: pic_cnt allocations attempted in sequence; on the i-th failure, free pictures [0, i) and the pool struct itself, then return the original error. The new C implementation (picture_vulkan_pool.c::pool_unwind) preserves this contract.
The DEVICE method introduces a bookkeeping wrinkle: VmafVulkanBuffer already maintains a host_ptr → buffer map for the legacy vmaf_vulkan_picture_alloc shim. The pool path bypasses that map by attaching the buffer handle to the picture's priv cookie + release_picture callback — when the pool unrefs a pic, the standard VmafPicture refcount path frees the buffer through VMA directly. No double-free risk since the pool is the sole owner of the cookie.
Test contract¶
Six smoke tests pin the API contract without dispatching GPU work:
NONEis a no-op (returns 0, allocates nothing).HOSTallocates, fetches round-robin, unrefs cleanly.DEVICEallocates, fetches, the host-mapped pointer is writable, ASan / UBSan don't flag a use-after-free on unref.- Fetch without preallocate falls back to a host-backed picture (mirrors the SYCL fallback contract for callers that ignore the preallocation surface).
- Unknown method →
-EINVAL. - NULL args →
-EINVAL(no crash).
End-to-end scoring against pool-allocated pictures lives in the cross-backend parity gate (unchanged — preallocation only changes where bytes live, not which bytes the kernel reads).
What this digest deliberately defers¶
- Pool depth tunable. SYCL has shipped at depth 2 for the full FFmpeg integration without complaint. If a real workload needs more, grow
VmafVulkanPictureConfigurationadditively. - Chroma plane preallocation. Both SYCL and Vulkan kernels are luma-only today. A future U/V-aware extractor pulls this in.
- External-handles parity. The pool uses the imported state's VkInstance/VkDevice via the fork-internal
vmaf_vulkan_state_context()accessor; external-handles callers (state_init_external) work transparently with no extra plumbing. - HOST_PINNED-like option. VMA's allocator API has no "pinned in the CUDA sense" mode. If a workload turns up that needs it, the right path is a new method (
VMAF_VULKAN_PICTURE_PREALLOCATION_METHOD_HOST_VISIBLE_DEVICE_LOCAL, ReBAR / SAM territory).
References¶
- VMA
AUTO_PREFER_HOSTsemantics: https://gpuopen.com/learn/vulkan-renderers-memory-allocation/. - SYCL pool reference:
core/src/sycl/picture_sycl.cpp(vmaf_sycl_picture_pool_init). - CUDA preallocation reference:
core/include/libvmaf/libvmaf_cuda.h,core/src/libvmaf.c::vmaf_cuda_preallocate_pictures. - Parallel surfaces: ADR-0186 / ADR-0251 (zero-copy import) — not replaced by this PR; preallocation serves the host-driven path.