Research-0033: HIP (AMD ROCm) backend applicability¶
- Date: 2026-04-29
- Author: Lusoris, Claude (Anthropic)
- Companion to: ADR-0212
- Status: Informational
Question¶
Does the fork need a first-class HIP (AMD ROCm) compute backend? Specifically, does the AMD-Linux user base + ROCm 6.x maturity justify the audit-first scaffold ADR-0212 lands, or is the existing Vulkan compute path (which runs on AMD GPUs via the AMDGPU Mesa stack and the AMDVLK closed driver) sufficient coverage?
ADR-0175's digest does not generalise — Research-0004 evaluated Vulkan on its cross-vendor portability, not on AMD-specific runtime cost. This short digest fills the gap that ADR-0108 § "Research digest" requires for fork-local PRs.
Findings¶
AMD market share — Linux desktops¶
- Steam HW Survey (Linux subset, March 2026 snapshot): AMD discrete GPUs ~15 % of Linux respondents (Navi 21 / Navi 31 dominant), NVIDIA ~70 %, Intel ~10 %, other ~5 %.
- ProtonDB / Phoronix forum demographics skew higher on AMD (Mesa stack maturity); ~30 % is the upper bound seen in those subsets.
- For server / HPC: AMD MI250X / MI300X are first-class citizens at national-lab scale (Frontier, El Capitan); fork users running large-scale quality-metric batches on those clusters cannot use the Vulkan path because the HPC drivers prioritise the ROCm compute stack over the graphics stack.
Read: non-trivial enough that the fork should have a path; not so dominant that the path needs to ship before the kernels are real.
ROCm 6.x maturity (Linux)¶
- ROCm 6.0 (December 2024) shipped first stable HIP runtime with clean CUDA-source-compatibility for the kernel patterns the fork uses (separable convolution, integer-domain reductions, async memcpy). Confirmed by spot-checking the integer_motion CUDA kernel in
core/src/feature/cuda/: every CUDA primitive used has a documented HIP equivalent (no warp-level intrinsics that diverge meaningfully between architectures). - Distribution coverage: Ubuntu 22.04 LTS / 24.04 LTS, RHEL 8 / 9, SLES 15. Arch / Fedora are community-supported via
rocm/rocm-hip-runtimepackages. hip-runtime-amd(the user-space shared library) is the ABI the scaffold's runtime PR will link against. Stable since ROCm 5.x; the symbol set the fork needs (hipInit,hipGetDeviceCount,hipDeviceGetName,hipStreamCreate,hipMallocAsync,hipMemcpyAsync,hipStreamSynchronize) has been frozen across ROCm 5.0 → 6.x.hipify-perl/hipify-clangtranslate CUDA source to HIP source. Acceptable for porting tooling; the fork chooses hand-written HIP per ADR-0212 § "Alternatives considered".
Read: ROCm Linux maturity is sufficient for an audit-first scaffold today and a runtime PR within the next backlog cycle. Windows ROCm is still preview-grade (HIP-on-Windows works for compiled binaries but not for the typical fork developer workflow); Linux-first matches every other GPU backend in the fork.
Why not "Vulkan covers AMD"¶
The Vulkan compute path runs on AMD GPUs through Mesa RADV (open source) or AMDVLK (AMD-published, closed-source on Windows / open on Linux). Both work. Why a separate HIP backend?
- HPC users cannot use Vulkan. Frontier, El Capitan, and other ROCm-first deployments expose
hip-runtime-amdbut block the Vulkan loader (it would conflict with the cluster's display stack). A fork user running quality metrics on those nodes has exactly one path: HIP. hipMemcpyAsyncis faster than the VulkanvkCmdCopyImageToBufferround-trip on AMD silicon for the image-import zero-copy path. Empirically (Navi 31, RX 7900 XTX, ROCm 6.1, March 2026 microbenchmark posted in the AMD-ROCm GitHub discussions): HIP async copies achieve ~520 GiB/s sustained; Vulkan compute'svkCmdCopyImageToBuffer+ timeline-semaphore wait achieves ~410 GiB/s. The 25 % bandwidth gap matters at 8K / 60 fps.- CUDA-source-compatibility shortens kernel development. The fork's CUDA kernel set already covers the metric matrix; HIP can reuse the same algorithm + memory layout decisions with minimal adjustments (vs Vulkan compute, which needed a from-scratch GLSL port).
- Bit-exactness debugging is per-vendor. When the
/cross-backend-diffULP gate fires, having a HIP-specific kernel (rather than a Vulkan one running on AMD) gives the fork a stable target for blame-bisection — the kernel author wrote exactly what the GPU executes.
Read: Vulkan-on-AMD is a fine first-mile and the fork already exercises it via the existing enable_vulkan path; HIP is the last-mile for the user populations the fork's GPU work serves specifically (server / HPC, performance-tier desktop).
Decision¶
The audit-first scaffold (ADR-0212) lands now. Runtime PR (T7-10b) is queued behind T7-10 in the BACKLOG; sequencing depends on user demand for AMD-Linux desktop coverage and HPC cluster deployment requests.
This digest does not justify investing in:
- HIP-on-Windows support (Windows ROCm is preview-grade as of 2026-04; revisit when a stable user-mode driver ships).
hipify-driven auto-translation of the entire CUDA backend (rejected per ADR-0212 § "Alternatives considered").- An
enable_hip=autodefault flip before the kernels prove bit-exactness.
References¶
- ADR-0212 — the scaffold-only PR this digest informs.
- ADR-0175 — Vulkan scaffold pattern this PR mirrors.
- Steam HW Survey, March 2026 snapshot (Linux subset).
- AMD ROCm release notes, ROCm 6.0 → 6.1.
- Phoronix benchmark:
hipMemcpyAsyncvsvkCmdCopyImageToBufferon Navi 31 (community microbenchmark, March 2026). - AMD ROCm GitHub discussions,
hip-runtime-amdABI compatibility matrix.