Skip to content

ADR-1089: Block non-standard ONNX operator domains in the DNN wire scanner

  • Status: Accepted
  • Date: 2026-06-07
  • Deciders: Lusoris
  • Tags: security, ai, dnn, fork-local

Context

The ONNX Runtime (ORT) dispatches operators by the tuple (domain, op_type), not by op_type alone. The standard built-in op set uses domain "" (empty string, equivalent to "ai.onnx"). Custom or third-party op sets register under non-empty, non-standard domain strings (e.g. "com.microsoft", "com.evil").

The fork's wire-format scanner (core/src/dnn/onnx_scan.c) checked only NodeProto.op_type (field 4) against the allowlist, but never read NodeProto.domain (field 7). A crafted model could therefore set op_type = "Conv" (matching the allowlist) and domain = "com.evil". The scanner would accept it, but ORT would dispatch to whatever custom op was registered under ("com.evil", "Conv") — executing arbitrary code as the VMAF process.

This is a defence-in-depth bypass: ORT's own sandboxing is the primary barrier, but the scanner's purpose is to reject obviously hostile inputs before ORT ever instantiates a session. Allowing non-standard domains defeats that purpose entirely.

Decision

The scanner reads NodeProto.domain (field 7) at every node level (including inside control-flow subgraphs via the existing recursive descent). Any domain string that is neither the empty string "" nor "ai.onnx" causes the scanner to return -EPERM, the same code returned for a disallowed op type. The first_bad out-pointer is populated with the rejected domain string for caller diagnostics.

Standard ONNX-ML ("ai.onnx.ml") and all vendor/custom domains are blocked. No exceptions are provided; if a future consumer requires an ONNX-ML op, a separate ADR must justify it.

Alternatives considered

Option Pros Cons Why not chosen
Allowlist specific non-standard domains (e.g. "ai.onnx.ml") Supports ONNX-ML ops Wider attack surface; every added domain requires auditing all its ops No current consumer needs ONNX-ML; default to narrow
Reject domain field presence entirely (require empty/absent domain) Simplest rule Rejects explicitly-tagged standard-domain models (domain="ai.onnx") exported by some tools Too strict; legitimate exporters may emit "ai.onnx" explicitly
No change — rely on ORT sandboxing Zero code change Scanner's stated purpose is pre-ORT rejection; partial allowlist is misleading if bypassable Violates the stated security invariant

Consequences

  • Positive: The (domain, op_type) tuple is now fully gated before ORT instantiates a session. A custom op cannot shadow an allowlisted built-in by using a non-standard domain.
  • Negative: Models that explicitly tag ops with any non-empty non-"ai.onnx" domain are rejected, even if ORT would run them safely. In practice all in-tree models use default (empty) domain, so no regression is expected.
  • Neutral / follow-ups: The read_domain helper is ~50 LOC and fully covered by 5 new unit tests. The onnx_scan.h doc comment is updated to document the four-field scope. No public API or CLI surface changes.

References

  • ONNX protobuf spec: NodeProto.domain = field 7 (string), standard domain is "" or "ai.onnx", ONNX-ML extension is "ai.onnx.ml".
  • ORT dispatch: IExecutionProvider::GetKernelRegistry() keyed on (op_type, domain, opset_version) — see ORT source onnxruntime/core/framework/op_kernel.h.
  • Prior scanner ADRs: ADR-0169 (Loop/If control-flow ops), ADR-0171 (Loop-count cap), ADR-0258 (Resize admitted).
  • Security audit: parallel agent wf_b08e0c22-717-7, 2026-06-07.