Research-0089: MoltenVK feasibility on the fork's Vulkan shader inventory¶
- Date: 2026-05-09
- Author: Lusoris, Claude (Anthropic)
- Companion ADR: ADR-0338
Question¶
Will the fork's existing Vulkan compute kernels (per ADR-0127 and the kernel ADRs 0176–0252) run end-to-end on Apple Silicon via MoltenVK, the Khronos-supported open-source Vulkan-on-Metal translation layer? And if not, which kernels are at risk and why?
TL;DR¶
Probably yes for 6 of 7 shaders; medium risk on moment.comp. The fork's Vulkan shaders rely on a small, well-defined set of GLSL extensions. The non-atomic int64-arithmetic path lowers to Metal's native long type and is supported on M1+. The single shader (moment.comp) using atomicAdd on int64 (GL_EXT_shader_atomic_int64) requires Metal Tier-2 argument buffers per the MoltenVK Runtime User Guide — supported on M1+ but the most fragile capability dependency. Any failure on the macOS runner is most likely to surface there first.
Method¶
- Inventory the fork's Vulkan shaders under
core/src/feature/vulkan/shaders/and grep for GLSL extension#extensiondirectives + atomic intrinsics. - Cross-reference each capability against the MoltenVK Runtime User Guide — the authoritative source for known translation-layer gaps.
- Verify Apple Silicon GPU capability tiers (
macos-latestis Apple Silicon; M1+ supports Tier-2 argument buffers). - Validate Homebrew install layout from the Homebrew/homebrew-core
molten-vk.rbformula source.
Shader inventory + capability matrix¶
| Shader | Extensions used | Atomic ops? | MoltenVK risk |
|---|---|---|---|
vif.comp | int64 arithmetic (non-atomic) | None | low |
adm.comp | int64 arithmetic (non-atomic) | None | low |
motion.comp | int64 arithmetic, per-WG SAD reduction | None (subgroup-uniform) | low |
motion_v2.comp | int64 arithmetic | None | low |
psnr.comp | int64 per-WG reduction | None | low |
float_psnr.comp | float only | None | low |
moment.comp | GL_EXT_shader_atomic_int64 | atomicAdd on int64 | medium |
(The cambi_*.comp shaders are not in this matrix because they use only int32 arithmetic and int32 shared memory — entirely within MoltenVK's vanilla Vulkan 1.0 surface.)
Capability deep-dive¶
int64 arithmetic (non-atomic)¶
GL_EXT_shader_explicit_arithmetic_types_int64 lowers SPIR-V OpTypeInt 64 1 operations to Metal's long type. MoltenVK's SPIRV-Cross handles this without special configuration. Apple GPUs have native 64-bit integer support since the M1; no driver configuration required.
Conclusion: low risk. Six of the seven listed shaders fall entirely in this bucket.
atomicAdd on int64 (moment.comp)¶
GL_EXT_shader_atomic_int64 requires the VK_KHR_shader_atomic_int64 Vulkan extension. The MoltenVK Runtime User Guide notes this extension as supported with the constraint:
Requires GPU Tier 2 argument buffers support.
Apple Silicon (M1, M1 Pro/Max/Ultra, M2, M3, …) supports Tier-2 argument buffers per Apple's Metal Feature Set Tables. GHA macos-latest runs on Apple Silicon (Homebrew prefix /opt/homebrew confirms M-series), so the capability is present.
The risk surface is driver maturity, not capability presence: SPIRV-Cross's int64-atomic translation has historically been a maturity hot-spot, and a regression in a future MoltenVK release could break this shader without breaking the others. The advisory lane catches that class of regression.
Conclusion: medium risk. The capability is present but the translation path has historically been the most fragile.
VK_KHR_external_memory_fd (DMABUF import)¶
The fork's Vulkan runtime uses DMABUF import for FFmpeg hw-frame zero-copy (per ADR-0127 §Decision and ADR-0186). MoltenVK does not support VK_KHR_external_memory_fd — Apple's IOSurface import path is the Metal-native equivalent and is not Vulkan- exposed.
Mitigation: ADR-0127 already specifies that on MoltenVK the external-memory path falls back to a host-staged copy. The smoke tests do not exercise import; they allocate via VMA host- visible buffers, which works.
Conclusion: not a blocker for the smoke-test lane. Becomes relevant once the fork wants zero-copy ffmpeg-on-macOS.
Pipeline statistics queries / app-controlled allocations / PVRTC¶
Per the MoltenVK User Guide these are documented gaps. None apply to the fork's shaders or runtime use:
- Pipeline statistics: not used.
- App-controlled allocations: VMA is allocation-policy-agnostic.
- PVRTC: no compressed textures.
Install layout (verified)¶
From homebrew-core molten-vk.rb:
inreplace "MoltenVK/icd/MoltenVK_icd.json",
"./libMoltenVK.dylib",
(lib/"libMoltenVK.dylib").relative_path_from(prefix/"etc/vulkan/icd.d")
(prefix/"etc/vulkan").install "MoltenVK/icd" => "icd.d"
Resolved on Apple Silicon (/opt/homebrew prefix):
- ICD descriptor:
/opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json - MoltenVK dylib:
/opt/homebrew/lib/libMoltenVK.dylib - Loader (separate formula
vulkan-loader):/opt/homebrew/lib/libvulkan.dylib - Headers (
vulkan-headers):/opt/homebrew/include/vulkan/ - glslc (
shaderc):/opt/homebrew/bin/glslc
The Khronos loader looks for ICDs at compile-time defaults plus VK_ICD_FILENAMES; pinning the env var to MoltenVK's JSON path makes the lane deterministic regardless of which other ICDs the runner image has.
Expected lane wall-clock cost¶
Estimate based on the current Build — macOS clang (CPU) lane (the only existing macos-latest reference) plus brew-install overhead for the Vulkan stack:
| Step | Estimate |
|---|---|
| Checkout + python + meson install | ~30s |
brew install ninja, nasm, ccache, llvm, pkg-config | ~60s |
brew install molten-vk, vulkan-loader, vulkan-headers, shaderc | ~90s |
meson setup + ninja (Vulkan build) | ~6–9 min |
| Smoke tests (3 binaries) | ~10s |
| Total | ~9–12 min |
Well within the 60-min macos-latest runner timeout. Cost is roughly equivalent to the existing macOS CPU+DNN lane.
Why advisory, not required¶
Per ADR-0338:
- MoltenVK is a moving target. A fresh-install brew bump can surface a regression in SPIR-V → MSL translation that breaks the atomic-int64 path; a required lane would block every PR until upstream MoltenVK fixes it (out of our control).
- The lane is paid for by
macos-latestminutes; a flaky required lane burns budget. - Mirrors the precedent set by ADR-0127's Arc-A380 nightly lane (advisory until self-hosted runner registered).
The promotion criterion is concrete: one green run on master flips continue-on-error off and adds the job name to required-aggregator.yml.
Open questions¶
- MoltenVK release cadence vs the fork's PR cadence — unanswered. If the bump-and-break cycle is faster than our PR cycle, the lane oscillates. Tracked under docs/state.md as a future watch item.
- Native Metal backend overlap — the parallel scaffold lands separately. Once both exist, this digest's relevance is bounded to the period before the Metal backend matures.
References¶
- MoltenVK Runtime User Guide
- Homebrew
molten-vkformula VK_KHR_shader_atomic_int64- Apple Metal Feature Set Tables
- GitHub Actions macOS runner spec
- Fork shader inventory:
core/src/feature/vulkan/shaders/ - ADR-0127 — Vulkan compute backend.
- ADR-0338 — the lane this digest informs.