ADR-0266: HIP fifth kernel-template consumer — float_ansnr_hip¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: Lusoris, Claude (T7-10b follow-up)
- Tags:
hip,gpu,feature-extractor,kernel-template
Context¶
The HIP backend's kernel-template scaffold (ADR-0241) needs a stable quorum of consumers before the runtime PR (T7-10b) can flip the kernel_template.c helper bodies from -ENOSYS to live HIP calls. Each consumer pins a shape of usage that the runtime PR's helper bodies must satisfy: 1-counter atomic readback (integer_psnr_hip, ADR-0241), float-partial readback (float_psnr_hip, ADR-0254), single-float-per-block bypass (ciede_hip, ADR-0259), and four-counter atomic readback (float_moment_hip, ADR-0260) are the four already landed.
This ADR adds a fifth consumer — float_ansnr_hip, the HIP twin of core/src/feature/cuda/float_ansnr_cuda.c (297 LOC). It pins a new shape: interleaved (sig, noise) per-block float partials where two doubles are reduced on the host before the score formula. Same submit_pre_launch bypass as ciede_hip (no atomic, no memset required), but doubles the partial-element width and exercises the post-reduction two-feature emission path (float_ansnr + float_anpsnr from a single kernel pass).
Decision¶
We add core/src/feature/hip/float_ansnr_hip.{c,h} as the fifth kernel-template consumer. The TU mirrors the CUDA twin call-graph-for-call-graph: identical state struct, identical init/submit/collect/close lifecycle, identical template-helper invocations. init() returns -ENOSYS until T7-10b lands the runtime helpers; the smoke test pins the registration shape; CHANGELOG and rebase-notes carry the addition.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
float_ansnr_hip (chosen) | Smallest unported CUDA twin (297 LOC); pins the (sig, noise) interleaved-partials readback shape that no prior consumer exercises; same submit_pre_launch bypass as ciede_hip so the runtime PR's helper-body work is the same shape we already validated. | Doubles per-block partial width vs the third consumer. | — |
integer_motion_v2_hip | Pins the temporal-extractor + ping-pong-buffer shape. | Different concern — taken as the sixth consumer in the same PR (ADR-0267). | — |
float_motion_hip (361 LOC) | Single-dispatch motion mirror. | 21% larger than integer_motion_v2_cuda.c; overlaps the temporal-extractor shape that motion_v2_hip already pins. | Defer to a later batch. |
integer_ssim_hip (384 LOC) | Multi-feature emission. | 30% larger than float_ansnr_cuda.c. | Defer to a later batch. |
Consequences¶
- Positive: One more concrete shape (
(sig, noise)interleaved float partials) is now pinned ahead of T7-10b, reducing the surface area the runtime PR has to validate against. The runtime PR's cross-backend-diff gate (ADR-0214) inherits a fifth consumer with no kernel work: just the standard helper-body flip. - Negative: One more
#if HAVE_HIPextractor row infeature_extractor.cto keep aligned with the runtime PR's picture buffer-type plumbing. - Neutral / follow-ups: AGENTS.md note pins the
submit_pre_launchbypass as a load-bearing invariant. The runtime PR will swap the-ENOSYSbody for live calls without touching this TU.
References¶
- ADR-0212 — HIP audit-first scaffold (T7-10).
- ADR-0241 — first consumer + kernel template.
- ADR-0254 — second consumer (
float_psnr_hip, PR #324). - ADR-0259 — third consumer (
ciede_hip, PR #330). - ADR-0260 — fourth consumer (
float_moment_hip, PR #330). - ADR-0267 — sixth consumer (this PR).
- ADR-0246 — GPU kernel-template pattern.
- CUDA twin:
core/src/feature/cuda/float_ansnr_cuda.c. - Source:
req(user dispatch) — "Add the fifth and sixth HIP runtime kernel-template consumers. ... Pick two more cleanest CUDA twins to port."