ADR-0771: SIMD twin coverage inventory and gap prioritisation¶
- Status: Accepted
- Date: 2026-05-29
- Deciders: lusoris
- Tags:
simd,docs,planning
Context¶
A systematic audit of core/src/feature/ was performed to determine which features have x86 AVX2, AVX-512, and arm64 NEON accelerated twins and which do not. This ADR records the findings and ranks the missing twins by implementation leverage so future /add-simd-path invocations are prioritised correctly.
Audit method: cross-referenced every *.c entry-point file under core/src/feature/ against x86/<feature>_avx2.c, x86/<feature>_avx512.c, and arm64/<feature>_neon.c, then inspected each feature's dispatch block to distinguish "no SIMD file" from "SIMD file exists but dispatcher does not call it".
Decision¶
We accept the inventory below as the authoritative coverage baseline and record the three highest-leverage gaps for future work. No implementation work is done in this ADR; each gap will be addressed via a dedicated /add-simd-path PR referencing this ADR.
Full coverage matrix¶
| Feature | AVX2 | AVX-512 | NEON | Notes |
|---|---|---|---|---|
adm (integer) | YES | YES | YES | Full dispatch in integer_adm.c |
cambi | YES | YES | YES | |
ciede | YES | YES | YES | |
float_adm | YES | YES | YES | |
float_moment | YES | — | YES | moment_avx512.c never created; AVX-512 dispatch absent |
float_motion | YES | YES | YES | |
float_ms_ssim | — | — | — | Delegates to ssim_avx2/neon via iqa_ssim_set_dispatch; effectively covered |
float_psnr | YES | YES | YES | |
float_ssim | — | — | — | Same delegation as float_ms_ssim; effectively covered |
float_vif | — | — | — | Delegates through vif_tools.c which has inline AVX-512 convolution; effectively covered |
integer_adm | YES | YES | YES | Full dispatch |
integer_motion | YES | YES | — | motion_neon.c exports only x_convolution_16_neon; y_convolution and sad have no NEON path |
integer_motion_v2 | YES | YES | YES | Full dispatch including NEON |
integer_psnr | YES | YES | YES | Full dispatch |
integer_ssim | — | — | — | Zero SIMD: no dispatch block, no SIMD includes |
integer_vif | YES | YES | YES | Full dispatch |
moment | YES | — | YES | Same as float_moment — moment_avx512.c absent |
motion | YES | YES | YES | |
ms_ssim | — | — | — | Delegates to ssim_avx2/neon; effectively covered |
ms_ssim_decimate | YES | YES | YES | |
psnr | YES | YES | YES | |
ssim | YES | YES | YES | |
ssimulacra2 | YES | YES | YES | |
vif | YES | YES | YES |
Dropped features (no SIMD twins expected):
ansnr— deliberately removed (PR #38 / commit70ed8b3c). Prior SIMD tests (test_ansnr_simd) were orphaned and removed in follow-up fixes.
True gaps (no effective SIMD coverage)¶
-
integer_ssim— zero SIMD (AVX2, AVX-512, NEON all absent). The inner loops (gaussian_filter_init, horizontal/vertical accumulation, SSIM term) are fully scalar.integer_ssimis the codec-side SSIM surface used by every non-float pipeline. -
integer_motionNEON incomplete —motion_neon.cprovides onlyx_convolution_16_neon; the equally hoty_convolution_8/16_neonandsad_neonare absent. On arm64 hostsinteger_motiontherefore runs its SAD and Y-convolution scalar.integer_motion_v2does NOT share this gap (it has a complete NEON dispatch). -
moment/float_momentAVX-512 —moment_avx2.candmoment_neon.cexist;moment_avx512.cdoes not. The two hot kernels (compute_1st_moment,compute_2nd_moment) are load-heavy reductions that benefit from wider vector lanes on Skylake-X / Ice Lake / Sapphire Rapids class hardware.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Add SIMD for float_ssim / float_vif directly | Direct control of the dispatch path | Both already delegate through shared helpers that have SIMD; redundant work | Rejected — delegation is effective |
| Batch all three gaps into one PR | Fewer PRs | Large diff, hard to review, one failure blocks all | Rejected — each gap gets its own /add-simd-path call |
| No-op / defer indefinitely | Zero effort | ARM64 runtime performance regresses relative to AVX2 parity; integer_ssim on GPU-less servers is fully scalar | Rejected |
Consequences¶
- Positive: future
/add-simd-pathagents have a definitive baseline; no re-audit needed before implementation PRs. - Negative: three gaps remain open until dedicated PRs land.
- Neutral follow-up:
integer_ssimAVX2 is gap #1;integer_motionNEON completion is gap #2;momentAVX-512 is gap #3 in priority order (see research digest for rationale).
References¶
- Code audit:
core/src/feature/x86/,core/src/feature/arm64/, and dispatcher blocks ininteger_*.c/float_moment.c(2026-05-29). - ADR-0179 — prior
float_momentAVX2 work that established themoment_avx2.cpattern. - Research digest:
docs/research/simd-twin-inventory-2026-05-29.md.