ADR-0747 — CUDA extern "C" invariant for host-looked-up kernels¶
Status: Accepted Date: 2026-05-28 Deciders: lusoris Related: Research-0747
Context¶
nvcc compiles .cu files as C++. Any __global__ kernel that is not declared inside an extern "C" block receives a C++ mangled symbol in the compiled PTX/cubin. When the host uses cuModuleGetFunction to resolve kernels by their plain C names, a mangled symbol causes CUDA_ERROR_NOT_FOUND, silently disabling the feature.
PR #77 fixed one instance of this pattern in integer_ssim/ssim_score.cu. A full audit (Research-0747) identified one additional broken file introduced after PR #77: integer_ssim/integer_ssim_score.cu, which provides the "ssim" feature on the CUDA backend. All three of its entry points (integer_ssim_horiz_8bpc, integer_ssim_horiz_16bpc, integer_ssim_vert_combine) were missing extern "C" wrapping, silently breaking --feature ssim --backend cuda since the file was introduced.
Decision¶
- Wrap all three
__global__entry points ininteger_ssim/integer_ssim_score.cuinsideextern "C" { }. - Add a CI script (
scripts/dev/check-cuda-extern-c.sh) that fails if any__global__function referenced bycuModuleGetFunctionis not covered by anextern "C"block. - Add a mandatory invariant to
core/src/feature/cuda/AGENTS.mdso that all future contributors are informed of the requirement before writing new CUDA kernel TUs.
Alternatives considered¶
- Rename all kernels to a
C-friendly symbol scheme using NVRTC or embedding a__attribute__((visibility("default")))compiler annotation: rejected. These approaches are more complex and do not address the root cause (C++ name-mangling).extern "C"is the standard CUDA idiom used by the NVIDIA SDK examples and by every other kernel in this codebase. - Switch from
cuModuleGetFunction(driver API) to the CUDA runtime API: out of scope for this PR. The driver API is intentional per ADR-0001 (deferred CUDA context creation). Changing the dispatch pattern would require pervasive refactoring across all 15 host extractor files. - No-op (accept the breakage): rejected.
--feature ssim --backend cudasilently produces no output in the broken state, which violates the correctness-first principle (see CLAUDE.md §6 andfeedback_correctness_first.mdmemory entry).
Consequences¶
--feature ssim --backend cudais restored to working order.- The CI script prevents future regressions of this class.
- No numerical changes:
extern "C"affects only symbol naming, not kernel code generation. - The fix is a one-file, three-line insertion; rebase risk is minimal.
References¶
- Research-0747 (
docs/research/research-0747-cuda-extern-c-sweep.md) core/src/feature/cuda/ssim_cuda.c:122–127(host lookup sites)core/src/meson.build:784(PTX generation confirming which.cumaps tointeger_ssim_score_ptx)