HIP kernel parity coverage — round 2 audit (2026-05-30)¶

Companion research digest for ADR-0883. Quantifies the HIP-side coverage gap remaining after ADR-0868 / PR #351 round-1, ranks the residual extractors by user-visible impact, and justifies the 5-test selection that round-2 ships.

Inventory¶

find core/src/feature/hip -maxdepth 1 -name '*_hip.c' returns 17 HIP extractor source files (one extractor per file, plus the hip_hsaco_stubs.c build-only entry that registers no extractor).

Extractor source	Registered name	Tested before round-2?	Round-2 ships test?
`integer_adm_hip.c`	`adm_hip`	yes — `test_hip_adm_parity` (ADR-0539)	—
`integer_motion_v2_hip.c`	`motion_v2_hip`	yes — `test_hip_motion3_parity`	—
`integer_psnr_hip.c`	`psnr_hip`	yes — `test_hip_psnr_parity` (PR #351)	—
`integer_vif_hip.c`	`vif_hip`	yes — `test_hip_vif_parity` (PR #351)	—
`ciede_hip.c`	`ciede_hip`	no	yes — `test_hip_ciede_parity`
`integer_psnr_hvs_hip.c`	`psnr_hvs_hip`	no	yes — `test_hip_psnr_hvs_parity`
`integer_motion_hip.c`	`motion_hip` (v1)	no	yes — `test_hip_motion_parity`
`integer_ssim_hip.c`	`integer_ssim_hip`	no	yes — `test_hip_ssim_parity`
`integer_ms_ssim_hip.c`	`integer_ms_ssim_hip`	no	yes — `test_hip_ms_ssim_parity`
`integer_cambi_hip.c`	`cambi_hip`	no	deferred — round 3 (CPU `cambi` needs encoded-bitrate metadata)
`ssimulacra2_hip.c`	`ssimulacra2_hip`	no	deferred — PR #290 actively refactoring this file
`speed_chroma_hip.c`	`speed_chroma_hip`	no	deferred — round 3 (CPU `speed_chroma` has no scalar reference yet)
`speed_temporal_hip.c`	`speed_temporal_hip`	no	deferred — round 3 (same)
`float_adm_hip.c`	`float_adm_hip`	no	deferred — float-pipeline rework upstream of ADR-0119
`float_motion_hip.c`	`float_motion_hip`	no	deferred — float-pipeline rework
`float_psnr_hip.c`	`float_psnr_hip`	no	deferred — float-pipeline rework
`float_ssim_hip.c`	`float_ssim_hip`	no	deferred — float-pipeline rework
`float_moment_hip.c`	`float_moment_hip`	no	deferred — internal helper, no public CPU twin

Coverage delta: 4 / 17 → 9 / 17 (24 % → 53 %) parity-gated HIP extractors after this PR.

Selection rationale¶

The 5 round-2 picks were ranked on three axes:

Score appearance in libvmaf-2.x.x default model — ssim, ms_ssim, ciede2000, psnr_hvs, and motion all surface as feature columns the default predictor consumes either directly or via aggregation.
CHUG re-extract column coverage — every CHUG run materialises these 5 features per clip; a silent HIP regression here would propagate into model training data.
Kernel implementation maturity — chose only extractors with a stable CPU reference and a stable HIP path (no in-flight refactor PRs). This excludes ssimulacra2_hip (PR #290) and the float-pipeline family (waiting on ADR-0119 successor).

cambi_hip and the speed_* family are deferred to round 3 because their CPU references require setup the synthetic-fixture template doesn't carry (encoded-bitrate metadata for cambi; no scalar reference for the speed features as of master tip).

Fixture choices¶

All 5 tests reuse the test_hip_vif_parity.c template verbatim:

256×144 YUV420P, 8 bpc (small enough for fast CI; large enough for MS-SSIM's 5-level pyramid to converge).
Synthetic gradient ((row + col + salt * k) & 0xFF) — produces a non-zero variance signal that exercises every reduction branch.
Chroma = 128 (mid-grey) except for ciede, which uses a salted chroma pattern because the CIE-Lab path is the whole point of that test.
1 frame for everything except motion (needs 2 frames so the t-1 reference exists).

Tolerance choices (ADR-0214)¶

Kernel	Filter?	Tolerance	Rationale
`ciede2000`	no	1e-4 (places=4)	Per-pixel reduction; matches CUDA twin in PR #351
`psnr_hvs`	yes (DCT)	1e-4 (places=4)	DCT is deterministic; reduction tree is shallow
`motion` v1	yes (Gaussian)	1e-4 (places=4)	Same kernel as motion3; equal precision budget
`ssim`	yes (window)	1e-3 (places=3)	Per-window mean/var/cov reduction has more rounding
`float_ms_ssim`	yes (pyramid)	1e-3 (places=3)	Multi-scale product amplifies per-scale rounding

Skip behaviour¶

Every test follows the established pattern: vmaf_hip_state_init() is called first; on err != 0 or hip_state == NULL the test emits [skip: no HIP device] to stderr and returns success. This lets CI runners without an AMD GPU run the suite without spurious failures (same as PR #351's psnr_hip + vif_hip).

Container-verified¶

Per CLAUDE.md §15, the test files compile against the vmaf-dev-mcp container's HIP toolchain. Local execution requires -Denable_hip=true and an AMD GPU; the dev workstation has no AMD card, so on-host execution exercises the skip path. CI ROCm runners (when present) will exercise the parity assertions.

Round-3 backlog¶

Extractor	Blocker	Resolution path
`cambi_hip`	needs `enc_width`/`enc_height`/`enc_bitdepth` options on the fixture	wrap CPU reference with synthetic encoder metadata
`ssimulacra2_hip`	PR #290 actively refactoring partial-alloc paths	wait for #290 merge, then add test
`speed_chroma_hip`, `speed_temporal_hip`	no scalar CPU reference exists on master	scope a CPU reference first or call into the existing GPU path twice with different RNG seeds
`float_adm_hip`, `float_motion_hip`, `float_psnr_hip`, `float_ssim_hip`, `float_moment_hip`	float pipeline pending ADR-0119 successor cleanup	re-evaluate after the float-pipeline ADR lands

References¶

ADR-0214 — cross-backend parity tolerance gate.
ADR-0539 — integer ADM HIP atomicAdd reduction pattern.
ADR-0868 — round-1 GPU kernel coverage gap-fill (PR #351).
ADR-0883 — this work (round 2).
PR #290 — HIP ssimulacra2 partial-alloc leak (skip overlap).
PR #308 — HIP/Metal ENOSYS stubs (no test overlap).
PR #351 — round-1 GPU kernel coverage (psnr_hip + vif_hip).