ADR-0985: SYCL parity divergence investigation — float_ssim + ssimulacra2 on Arc A380¶
- Status: Proposed
- Date: 2026-06-03
- Deciders: Lusoris, Claude (Anthropic)
- Tags: sycl, parity, ci, gpu, precision, arc
Context¶
The cross-backend parity matrix run on Intel Arc A380 (Research-0730, 2026-05-27) recorded three FAIL rows in the SYCL matrix:
| Feature | Max abs diff | Tolerance | Status |
|---|---|---|---|
float_ssim | 2.68e-04 | 5.0e-05 | FAIL |
ssimulacra2 | 8.72e-02 | 5.0e-03 | FAIL |
float_ansnr | 1.59e-04 | 5.0e-05 | FAIL |
A follow-up investigation (Research-0985, 2026-06-03) found:
-
float_ansnr: Stale row. The extractor was removed in PR #38. No ANSNR code exists in the tree. This row is closed. -
float_ssim: Two-cause divergence. The primary cause is a formula difference between the CPU (L×C×S withsqrt(var_ref*var_cmp)) and all GPU backends (combined Wang 2004 Eq.(13) with2*covar). This formula difference is intentional; the same divergence is present for CUDA and Vulkan. The secondary cause is Arc A380's lack of native fp64, which amplifies fp32 accumulation error over the full-frame reduction (~566×314 pixels). The same SYCL kernel passes at places=4 on RTX 4090 and lavapipe. -
ssimulacra2: Arc A380 lacks native fp64. The 3-pole IIR Charalampidis blur inssimulacra2_sycl.cppaccumulates fp32 rounding over up to 329 rows of recurrence state per pyramid scale (6 scales). The XYB + pooling + combine stages already run on host. Divergence is ~17× over the places=2 contract on Arc A380; RTX 4090 passes places=2. Cannot be fixed without hardware access.
Decision¶
No kernel code change is made in this ADR. The decision is:
-
Close the
float_ansnrrow — extractor no longer exists. -
Add a
float_ssim_syclparity test (test_sycl_float_ssim_parity) with places=3 (5e-04) tolerance. This tolerance accommodates both the intentional formula difference and Arc A380 fp32 accumulation drift. fp64-capable hardware (RTX 4090, lavapipe) passes at places=4. -
Add a clarifying comment to
integer_ssim_sycl.cppdocumenting the Wang Eq.(13) combined formula and its divergence from the CPU L×C×S path. -
Add open tracking rows to
docs/state.mdfor thefloat_ssimandssimulacra2Arc A380 calibration gap, pointing to Research-0985 and the required follow-up work (device-calibration YAML entries per ADR-0234 / Research-0730 §6).
The actual calibration entries and any kernel fix require hardware access and measurement-driven follow-up PRs. This ADR records the investigation findings and the immediate no-hardware-required deliverables.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Change GPU float_ssim formula to match CPU L×C×S | Removes formula divergence | Changes GPU output for all hardware (CUDA, Vulkan, SYCL, HIP, Metal); requires matching changes in 5 TUs; adds sqrt per pixel to every kernel | Cannot be validated without full cross-backend re-measurement; high-blast-radius change |
| Add Arc A380 device-calibration entry now | Closes the parity gate formally | Requires measurement of scripts/ci/gpu_ulp_calibration.yaml impact and a separate ADR with hardware evidence | Deferred to follow-up ADR; this PR cannot run the gate |
| Move ssimulacra2 IIR blur to host | Trivially achieves places=4 | Defeats the purpose of the GPU kernel (all compute on CPU) | Not implemented in this ADR |
| Kahan-compensated IIR summation | Potentially halves accumulation error | Adds 6 FP ops per output; cannot be validated without Arc A380 hardware | Deferred to follow-up with hardware access |
Consequences¶
- Positive:
float_ssim_syclnow has a unit test (previously completely untested). The formula difference is documented in code. The float_ansnr stale row is closed. Two open tracking rows indocs/state.mdprevent the Arc A380 calibration gap from being forgotten. - Negative: None — no kernel changes, no score changes.
- Neutral / follow-ups: Follow-up ADR for device-calibration YAML entries (Arc A380
float_ssimplaces=3,ssimulacra2places=1) required before Arc A380 can be promoted to a required CI lane per Research-0730 §6.1.
References¶
req— task brief: "investigate SYCL parity failures for float_ssim (2.68e-4), ssimulacra2 (8.72e-2), float_ansnr (1.59e-4); verify float_ansnr fully gone; identify divergence source; patch if high-confidence"- Research-0985: SYCL parity divergence investigation (this companion document)
- Research-0730: Cross-backend parity — Intel Arc A380 (2026-05-27) — original measurements
- ADR-0214: GPU-parity CI gate — tolerance table and promotion criteria
- ADR-0188/ADR-0189: float_ssim GPU kernel, places=4 contract
- ADR-0192/ADR-0201/ADR-0206: ssimulacra2 precision contracts and cbrt-on-host fix
- ADR-0564: integer_ssim_sycl fp64-free precision, places=4-5