Skip to content

ADR-0022: Inference runtime is ONNX Runtime via execution providers

  • Status: Accepted
  • Date: 2026-04-17
  • Deciders: Lusoris, Claude (Anthropic)
  • Tags: ai, dnn, cuda, sycl, build

Context

The fork already ships CPU / CUDA / SYCL / HIP backends. Tiny-AI inference needs a runtime that spans the same matrix without forking inference code per backend, and must be safely sandboxable with an operator allowlist.

Decision

We will use ONNX Runtime C API; gate the build with Meson -Denable_dnn=auto; ORT execution providers map to libvmaf backends — CPU → CPU EP, CUDA → CUDA EP, SYCL/Intel → OpenVINO EP, HIP → ROCm EP.

Alternatives considered

Option Pros Cons Why not chosen
TensorRT Fast on NVIDIA NVIDIA-only; no CPU/SYCL/HIP path Splits inference code per backend
OpenVINO direct Great on Intel Intel-focused; no CUDA/HIP Same
Per-backend native runtimes Max perf Huge duplication; each backend diverges Too expensive
ONNX Runtime (chosen) One runtime; execution providers match our backend matrix 1:1; sandboxed graph Some perf overhead vs native Rationale: avoids forking inference paths; op allowlist restricts further

Consequences

  • Positive: single inference code path; execution provider selection mirrors backend selection.
  • Negative: ORT pulls in transitive deps; must maintain op allowlist.
  • Neutral / follow-ups: ADR-0039 wires the runtime op-allowlist + registry.

References

  • Source: Q5.3
  • Related ADRs: ADR-0020, ADR-0021, ADR-0023, ADR-0039