ADR-0245: SIMD bit-exact test harness shared header¶
- Status: Accepted
- Date: 2026-04-29
- Deciders: Lusoris, Claude (Anthropic)
- Tags: simd, test, dx, fork-local
Context¶
Every fork-added SIMD parity test (test_psnr_hvs_avx2.c, test_psnr_hvs_neon.c, test_moment_simd.c, test_motion_v2_simd.c, test_ssimulacra2_simd.c) re-implements the same scaffolding:
- A
xorshift32PRNG for reproducible inputs (six file-local copies before this change). - A portable aligned allocator that branches on
_MSC_VER || __MINGW32__— added in PR #198 once and then copy-pasted into each new SIMD test. - An x86 AVX2 CPUID gate (
vmaf_get_cpu_flags_x86()+VMAF_X86_CPU_FLAG_AVX2+ ag_has_avx2static + a "skipping" stderr line). - A
memcmp-based byte-equal assertion or a relative-tolerance assertion, both wrapped inmu_assert.
The repeated boilerplate is ~50–100 LOC per file. New SIMD ports (planned: SSIMULACRA 2 SVE2, additional metric NEON twins) would multiply the duplication. Each copy is also a potential drift point: when PR #198 fixed MinGW aligned_alloc we had to touch every test file individually.
The Power-of-10-compliant fix is a single shared header with no behavior changes — the helpers do not touch the SIMD data path itself; they only own setup, assertion, and CPU-feature gating.
Decision¶
We will introduce core/test/simd_bitexact_test.h, a shared static inline header that exposes:
simd_test_xorshift32(state)— Marsaglia 2003 PRNG.simd_test_aligned_malloc(size, align)/simd_test_aligned_free(ptr)— POSIX/MinGW/MSVC portable.simd_test_fill_random_f32/_u16/_i32_mod— deterministic fillers covering the existing tests' input ranges.simd_test_have_avx2()— runtime CPUID gate for x86, no-op elsewhere.SIMD_BITEXACT_ASSERT_MEMCMP(scalar_buf, simd_buf, n_bytes, label)— byte-compares two buffers and prints the first diverging byte index on failure.SIMD_BITEXACT_ASSERT_RELATIVE(scalar_val, simd_val, rel_tol, label)— relative-tolerance double comparison (the moment-style "tolerance-bounded, not bit-exact" contract per ADR-0179).
The four representative SIMD tests (test_psnr_hvs_avx2.c, test_psnr_hvs_neon.c, test_moment_simd.c, test_motion_v2_simd.c) migrate to the harness as proof. The unchanged-ssimulacra2 test stays put for now (its filler has a near-identical but not byte-identical FP rounding path; migrating it risks shifting input bit patterns and is out of scope for this PR).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Macros for fixture allocation (e.g. SIMD_BITEXACT_TEST_FIXTURE(name, w, h, bpc)) | Cuts the visible test body further; one-line scaffolding | Hides allocation lifetime behind a macro, breaks ASan / UBSan stack traces, fixed structure cannot express motion_v2's 4-buffer adversarial layout, fights Power-of-10 rule 2 (loop bounds visible) | Selected the inline-helper path: explicit simd_test_aligned_malloc calls keep lifetime visible and let each test compose its own buffer set. |
C-template-style helper functions taking a (scalar_fn, simd_fn) pair via function pointers | Maximum dedup; one helper drives every test | Forces every kernel to share a signature; the four target tests have wildly different signatures (compute_1st_moment vs od_bin_fdct8x8 vs motion_score_pipeline_16); function pointers also defeat clang-tidy's static-call-graph analysis | Rejected: signature heterogeneity dominates. |
| Per-feature harness (one helper header per feature family) | Tightest scoping | Multiplies the maintenance surface — six metric families would mean six near-identical headers | Rejected: the centralised harness is the linker-and-include-tree minimum. |
Random with fixed seed via srand/rand | Standard library, no PRNG to maintain | rand is in the project's banned-functions list (docs/principles.md §1.2 rule 30), and the per-host RAND_MAX makes outputs non-portable | Rejected: banned, and reproducibility across CI hosts is a hard requirement for bit-exact tests. |
Deterministic-pattern (no PRNG; e.g. pix = i ^ j) | No PRNG state; trivially auditable | Misses tail-lane and signed-overflow edges that motion_v2's adversarial-negative fixture and psnr_hvs's 12-bit DCT seeds depend on | Rejected as the default, kept as one of three named fillers (simd_test_fill_random_*). Tests can still hand-author deterministic patterns where they make sense (constant-field, delta-input cases in test_psnr_hvs_avx2.c). |
Consequences¶
- Positive: ~106 LOC removed across the four migrated tests; new SIMD parity tests cost ~20 LOC of test-body code instead of ~50–100 LOC of scaffolding + test body. Future fixes to the portable-aligned-allocator (the next MinGW gotcha after PR #198) land in one file. The relative-tolerance assertion produces a structured failure message (scalar / simd / rel / tol) that the previous per-file
mu_asserttext did not. - Negative: Test TUs gain one extra header in their include graph. Callers must
#include "test.h"BEFOREsimd_bitexact_test.hbecausetest.hlacks a header guard and would redefine themu_reportstatic inlineif pulled in twice; an inline comment in each migrated test calls this out. A future T7-style sweep should add header guards totest.hand remove the include-order constraint. - Neutral / follow-ups:
test_ssimulacra2_simd.cis intentionally untouched in this PR — itsfill_randomformulation differs in FP rounding order fromsimd_test_fill_random_f32, and changing the input bit pattern of an existing bit-exact test risks masking a regression. A separate dedup PR with a SIMD snapshot rerun under/cross-backend-diffcan migrate it. The harness'sSIMD_BITEXACT_ASSERT_MEMCMPshould also replacemu_assert("...", memcmp(...) == 0)patterns elsewhere in the test tree opportunistically.
References¶
- PR #198 — portable aligned allocator MinGW fix that motivated centralising the helper.
- ADR-0138 — bit-exactness contract for SIMD parity tests.
- ADR-0140 —
places=4/places=8validation gates the harness preserves verbatim. - ADR-0179 — moment SIMD tolerance contract that drives
SIMD_BITEXACT_ASSERT_RELATIVE. - Source:
req— fork-local DX simplification request 2026-04-29.