ADR-0407: AdaptiveCpp as a second SYCL toolchain¶
- Status: Accepted
- Date: 2026-05-08
- Deciders: Lusoris, Claude (Anthropic)
- Tags: sycl, build, toolchain, fork-local, ci, contributor-experience
Context¶
The fork's -Denable_sycl=true build path has been hard-wired to Intel's icpx compiler (oneAPI DPC++) since the SYCL backend landed (ADR-0027, ADR-0217). The full Intel oneAPI Base Toolkit installer is ~2.6 GB, ships closed-source binaries, and is awkward to obtain on hosts that do not have Intel hardware to target. This is a real friction point for new contributors who only need to type-check or run unit tests against the SYCL TUs — the bar is currently "install all of oneAPI", even when the contributor machine has no Intel iGPU, no Arc GPU, and no NPU.
Topic B of Research-0086 recommended GO-AS-SECOND-TOOLCHAIN for AdaptiveCpp (formerly OpenSYCL / hipSYCL). AdaptiveCpp is a community-driven, BSL-licensed SYCL implementation built on LLVM. It supports CUDA, HIP, OpenMP CPU, and a generic SPIR-V backend, and it is the only realistic open-source SYCL implementation in 2026.
The audit identified two showstoppers: (1) ten kernel sites use the Intel-specific [[intel::reqd_sub_group_size(32)]] attribute (and one uses SG_SIZE parameterised) that AdaptiveCpp rejects; (2) the core/src/meson.build SYCL block hard-codes icpx-only flags (-fsycl, -fp-model=precise) and Intel runtime libraries (svml, irc). Both are mechanical to wrap.
Decision¶
Add AdaptiveCpp as a second supported SYCL toolchain. Intel oneAPI icpx remains the primary toolchain.
Concretely:
core/src/feature/sycl/sycl_compat.h(new, ~60 LOC). One public macro:VMAF_SYCL_REQD_SG_SIZE(N). Expands to the Intel attribute under__INTEL_LLVM_COMPILER, to a no-op underSYCL_IMPLEMENTATION_ACPP/SYCL_IMPLEMENTATION_HIPSYCL. The ten previously hard-coded call sites switch to the macro.core/src/meson.builddetects the configuredsycl_compilerby basename.acpp/syclcc/syclcc-clangtake the AdaptiveCpp path:--acpp-targets=<value>instead of-fsycl,-ffp-contract=offinstead of-fp-model=precise, andlibacpp-rt.so(withlibhipSYCL-rt.solegacy fallback) as the runtime library. Intelsvml/ircare skipped under AdaptiveCpp.core/meson_options.txtaddssycl_acpp_targets(defaultgeneric) so contributors can override the AdaptiveCpp target list (e.g.omp;cuda:sm_75).sycl_compilerdescription is updated to advertise both toolchains.docs/development/sycl-toolchains.md(new) documents the AdaptiveCpp install path (Arch AURadaptivecpp, source build), the capability matrix, the numerical-conformance gap vs icpx, and the supported--acpp-targetsvalues.
Numerical conformance. --acpp-targets=omp (CPU OpenMP) is not bit-identical to icpx and not bit-identical to scalar CPU; the fork's feedback_golden_gate_cpu_only rule already documents that no GPU/SYCL backend is bit-identical to scalar CPU. The AdaptiveCpp lane is a "yet another non-bit-identical backend" with its own ULP-tolerance budget when the cross-backend gate is extended to cover it (deferred to a follow-up PR; this PR ships the build plumbing only).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Stay icpx-only | Zero new code; one supported path; bit-identical-to-icpx is the only acceptance bar. | 2.6 GB closed-source toolchain remains a contributor blocker; CI runners without Intel hardware cannot exercise SYCL TUs. | Solves the wrong problem — Research-0086 Topic B specifically flagged contributor friction as a real cost. |
| Replace icpx with AdaptiveCpp | One toolchain, open source, smaller install. | Loses Intel discrete-GPU codegen quality; OpenVINO / NPU enablement story is icpx-coupled; published-binary ABI changes. | Net regression for the fork's primary user — Intel hardware. icpx stays primary. |
| Wrap differences in a pure CMake/meson "compatibility shim" without source-level macros | No source churn. | Pre-processor branch logic still has to live somewhere; [[intel::reqd_sub_group_size]] cannot be hidden behind a meson flag because it appears in kernel-lambda attribute position. | Source-level macro is the minimal-surface fix. |
Drop the reqd_sub_group_size attribute entirely under both toolchains | One code path, no compat header. | The attribute is load-bearing for icpx codegen quality on Arc / Battlemage; removing it loses a measured ~5–8 % on motion_v2_sycl per the T7-8 bench audit. | Trades the wrong cost — keep the attribute for icpx, neutralise it for acpp. |
Consequences¶
Positive:
- Contributors without Intel hardware can install AdaptiveCpp from the AUR (or build from source) and get a working
-Denable_sycl=truebuild. - A future CI lane can run SYCL unit tests on stock
ubuntu-latestrunners via--acpp-targets=omp, catching toolchain-monoculture bugs that only one vendor would otherwise hide. - The
sycl_compat.hshim is a useful lever: any future Intel- or vendor-specific SYCL extension (e.g. thegroup_loadrewrite proposed in Research-0086 Topic A.4) lands behind a macro symmetric toVMAF_SYCL_REQD_SG_SIZE.
Negative:
- One more toolchain to support means one more CI lane's worth of drift to audit on each rebase; the new ULP-tolerance budget for acpp's CPU OpenMP backend is yet to be measured.
- Sub-optimal sub-group selection on Intel HW under AdaptiveCpp: the runtime picks the "natural" sub-group size per backend, which is not always 32 on Xe-HPG / Xe-HPC; throughput on Intel devices under acpp may trail icpx output by a few percent on the affected kernels.
Neutral / follow-ups:
- A separate PR can add
.github/workflows/sycl-acpp.yml(sized ~50 LOC in Research-0086 Topic B.4) that builds with AdaptiveCpp on a stockubuntu-latestrunner. make lint/clang-tidyhave no AdaptiveCpp-specific lane in this PR; the existingscripts/ci/clang-tidy-sycl.shwrapper (ADR-0217) targets icpx paths and is unaffected.- The Netflix golden gate (CPU-only, CLAUDE.md §8) is unaffected — it does not exercise the SYCL backend.
- ADR-0217 gets a
### Status update 2026-05-08: AdaptiveCpp addedappendix per the ADR-0028 maintenance rule.
References¶
- Research-0086 §Topic B — the SYCL toolchain audit that proposed this work.
- ADR-0027 — current SYCL experimental flags, all icpx-coupled.
- ADR-0217 — multi-version oneAPI recipe + clang-tidy wrapper.
- ADR-0220 — fp64-free kernel contract (preserved verbatim under both toolchains).
core/src/sycl/AGENTS.md— SYCL invariants on rebase.- AdaptiveCpp upstream: https://adaptivecpp.github.io/AdaptiveCpp/ (compiler identification macros, target syntax).
- AUR
adaptivecpp25.10.0-2 — the pinned reference for Arch contributors. req— user direction 2026-05-08: "GO-AS-SECOND-TOOLCHAIN recommendation in the digest".