ADR-0423: Metal IOSurface zero-copy import (T8-IOS)¶
- Status: Accepted
- Date: 2026-05-11
- Deciders: kilian, Claude (Anthropic)
- Tags: metal, ffmpeg-patches, gpu, t8-ios
Context¶
The Metal backend runtime (ADR-0420 / T8-1b) and the first eight feature kernels (ADR-0421 / T8-1c–j) are live: a caller that hands libvmaf a software VmafPicture can already score it on an Apple Silicon GPU via the --backend metal / metal_device= CLI surfaces (ADR-0422). The remaining gap is the FFmpeg hwdec zero-copy path — VideoToolbox-decoded AVFrames arrive as AV_PIX_FMT_VIDEOTOOLBOX with data[3] -> CVPixelBufferRef, and the only way to feed them into libvmaf today is the hwdownload,format=nv12 round-trip the regular libvmaf filter forces. That defeats the unified-memory zero-copy posture ADR-0420 paid for on Apple Silicon: every frame round-trips GPU→host→GPU through vmaf_picture_alloc even though the IOSurface backing the source CVPixelBufferRef was already addressable from any Apple-Family-7 MTLDevice.
The Vulkan flavour solved the symmetric problem in ADR-0184 (scaffold) + ADR-0186 (impl): caller-supplied VkImage handles, an external-init entry point that adopts FFmpeg's VkDevice so the source images are valid on libvmaf's compute device, plus a dedicated libvmaf_vulkan filter (ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch) shipping the consumer side. We want the same shape for Metal so the FFmpeg integration story stays uniform.
Three constraints originally pushed the work toward a two-PR rollout (scaffold + impl, mirroring ADR-0184 → ADR-0186): API + patch coupling, obj-c++ review surface, and the missing AVMetalDeviceContext in FFmpeg n8.1.1. The scaffold landed in this PR's first commit; the impl (originally pencilled in as a follow-up "T8-IOS-b") was folded into the same PR after review feedback that an audit-first scaffold with no runtime exercise on Apple Silicon adds little reviewer value over a single working-impl PR. The ADR therefore documents the end state (scaffold + impl in one), keeping the Vulkan two-phase rollout as the historical reference rather than a hard contract for new backends.
Decision¶
Land the IOSurface import path in a single PR carrying both the C-API surface and the working implementation:
- Public API additions in
core/include/libvmaf/libvmaf_metal.h:VmafMetalExternalHandles,vmaf_metal_state_init_external,vmaf_metal_picture_import,vmaf_metal_wait_compute,vmaf_metal_read_imported_pictures. Lifetime + same-device model symmetric to the Vulkan import surface (ADR-0184 / ADR-0186). - Implementation in
core/src/metal/picture_import.mm:IOSurfaceLock(kIOSurfaceLockReadOnly)+ per-row memcpy into a shared-storageVmafPictureallocated viavmaf_picture_alloc. Ring depth of 2 slots matches the SYCL preallocation pool and the Vulkan v1 default. The texture-direct path ([MTLDevice newTextureWithDescriptor:iosurface:plane:]/CVMetalTextureCacheCreateTextureFromImage) remains available as a future optimisation if a Metal feature kernel ever needs per-sample texture access; v1 routes everything through the same VmafPicture host-pointer pipeline the rest of the scoring extractors consume, which is the cheapest path on Apple Silicon's unified-memory architecture (a Shared-storage MTLBuffer copy and a CPU memcpy from a locked IOSurface have the same memory cost). - FFmpeg-side filter in
ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch: registerslibvmaf_metal, acceptsAV_PIX_FMT_VIDEOTOOLBOX, pulls IOSurfaces viaCVPixelBufferGetIOSurface, routes through the new C-API. Passeshandles.device = 0so libvmaf falls back toMTLCreateSystemDefaultDeviceuntil upstream FFmpeg ships anAVMetalDeviceContext; the documented limitation is a multi-GPU Mac Pro device-match gap covered by the same-device invariant inherited from ADR-0184. - Tests in
core/test/test_metal_smoke.c: input-validation assertions on every new entry point + a default-device-success-or--ENODEVskip semantics forvmaf_metal_state_init_external.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Single PR carrying API + filter + runtime + tests | One review, no -ENOSYS dead-code path. | Mixes 3 review domains (C-API, FFmpeg, obj-c++); ADR-0184 precedent shows two-phase reviews catch design issues earlier. | Rejected — Vulkan's two-phase rollout caught the same-device constraint at scaffold time (would have wasted a full impl pass to discover post-hoc). |
Skip the dedicated filter, extend the regular libvmaf filter to consume IOSurfaces when metal_device >= -1 and input is VideoToolbox | One filter, fewer config knobs. | Couples the metal_device option to a specific hwaccel input format; breaks pixfmt negotiation for the software path (regular libvmaf filter expects AV_PIX_FMT_YUV420P etc.); diverges from Vulkan / SYCL precedent which uses dedicated filters per hwdec. | Rejected — uniformity with libvmaf_sycl / libvmaf_vulkan wins. |
Defer the entire surface until FFmpeg ships AVMetalDeviceContext | Avoids the "pick default MTLDevice" hack at config_props_metal. | Upstream has no public timeline; we'd block the whole IOSurface story on something we don't control. Single-GPU Apple Silicon Macs (the common case) don't need the device-match guarantee anyway. | Rejected — the default-device pick is a documented limitation with a clean upgrade path. |
Use CVMetalTextureCacheCreateTextureFromImage directly (no IOSurface extraction) | One Apple API instead of two. | The cache requires a CVPixelBufferRef; works on macOS but couples libvmaf to CoreVideo. Direct newTextureWithDescriptor:iosurface: keeps the C API IOSurface-only and lets non-CoreVideo callers (e.g. ScreenCaptureKit, AVPlayer hooks) import frames too. | Rejected for the public surface; the impl TU may still use the cache internally for biplanar layouts where descriptor chains are cumbersome. |
Consequences¶
- Positive:
- Symbol parity with the Vulkan import path: identical entry-point shape, lifetime model, and same-device contract.
- The FFmpeg filter ships now and starts surfacing usage / option feedback before the runtime lands. Reviewers can validate the user-facing CLI without waiting on obj-c++ review.
-
Lawrence's tap (
lusoris/homebrew-tap) gets the matching--enable-libvmaf-metal-filteractivation in a separate follow-up commit; this scaffold doesn't change the tap. -
Negative:
config_props_metalpicks a default MTLDevice until FFmpeg exposesAVMetalDeviceContext. Multi-GPU Mac Pro hosts may pick the wrong device; documented limitation, tracked as a follow-up once upstream ships the device-context API.-
v1 takes a CPU memcpy on every plane (no texture-direct path). On Apple Silicon's unified-memory architecture this costs the same as a Shared-storage MTLBuffer copy; if a future Metal kernel grows a per-sample texture access pattern that benefits from the IOSurface backing directly, a follow-up ADR can layer
newTextureWithDescriptor:iosurface:plane:on top without changing the public C API. -
Neutral / follow-ups:
- Tap: add
--enable-libvmaf-metal-filtertoFormula/ffmpeg.rbinlusoris/homebrew-tapto activate the filter for Homebrew users (separate PR; this one only adds the libvmaf surface + FFmpeg patch). - GPU-parity gate: no impact — the filter calls the same scoring pipeline as the software path, so cross-backend tolerances apply unchanged.
- GPU-parity gate: no impact — the filter calls the same scoring pipeline as the software path, so cross-backend tolerances apply unchanged once T8-IOS-b lands.
References¶
- ADR-0184 — Vulkan import scaffold precedent (audit-first stubs).
- ADR-0186 — Vulkan import implementation precedent (drops the -ENOSYS contract).
- ADR-0420 — Metal runtime that this builds on.
- ADR-0421 — first Metal kernel set; defines the
MTLSharedEventsynchronisation model the import path piggybacks on. - ADR-0422 — CLI flags for the software-input Metal path; the filter exposes the same
metal_device/--backend metalstory for hwdec input. ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch— template the patch 0013 dispatch body mirrors.- Source:
req(user direction: "create t8-1d+patch etc..." — paraphrased; user clarified mid-session that the intent was the IOSurface filter, not the already-merged dispatch-files PR). - Pairs with:
ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch.