ADR-1089: Block non-standard ONNX operator domains in the DNN wire scanner¶
- Status: Accepted
- Date: 2026-06-07
- Deciders: Lusoris
- Tags:
security,ai,dnn,fork-local
Context¶
The ONNX Runtime (ORT) dispatches operators by the tuple (domain, op_type), not by op_type alone. The standard built-in op set uses domain "" (empty string, equivalent to "ai.onnx"). Custom or third-party op sets register under non-empty, non-standard domain strings (e.g. "com.microsoft", "com.evil").
The fork's wire-format scanner (core/src/dnn/onnx_scan.c) checked only NodeProto.op_type (field 4) against the allowlist, but never read NodeProto.domain (field 7). A crafted model could therefore set op_type = "Conv" (matching the allowlist) and domain = "com.evil". The scanner would accept it, but ORT would dispatch to whatever custom op was registered under ("com.evil", "Conv") — executing arbitrary code as the VMAF process.
This is a defence-in-depth bypass: ORT's own sandboxing is the primary barrier, but the scanner's purpose is to reject obviously hostile inputs before ORT ever instantiates a session. Allowing non-standard domains defeats that purpose entirely.
Decision¶
The scanner reads NodeProto.domain (field 7) at every node level (including inside control-flow subgraphs via the existing recursive descent). Any domain string that is neither the empty string "" nor "ai.onnx" causes the scanner to return -EPERM, the same code returned for a disallowed op type. The first_bad out-pointer is populated with the rejected domain string for caller diagnostics.
Standard ONNX-ML ("ai.onnx.ml") and all vendor/custom domains are blocked. No exceptions are provided; if a future consumer requires an ONNX-ML op, a separate ADR must justify it.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Allowlist specific non-standard domains (e.g. "ai.onnx.ml") | Supports ONNX-ML ops | Wider attack surface; every added domain requires auditing all its ops | No current consumer needs ONNX-ML; default to narrow |
| Reject domain field presence entirely (require empty/absent domain) | Simplest rule | Rejects explicitly-tagged standard-domain models (domain="ai.onnx") exported by some tools | Too strict; legitimate exporters may emit "ai.onnx" explicitly |
| No change — rely on ORT sandboxing | Zero code change | Scanner's stated purpose is pre-ORT rejection; partial allowlist is misleading if bypassable | Violates the stated security invariant |
Consequences¶
- Positive: The
(domain, op_type)tuple is now fully gated before ORT instantiates a session. A custom op cannot shadow an allowlisted built-in by using a non-standard domain. - Negative: Models that explicitly tag ops with any non-empty non-
"ai.onnx"domain are rejected, even if ORT would run them safely. In practice all in-tree models use default (empty) domain, so no regression is expected. - Neutral / follow-ups: The
read_domainhelper is ~50 LOC and fully covered by 5 new unit tests. Theonnx_scan.hdoc comment is updated to document the four-field scope. No public API or CLI surface changes.
References¶
- ONNX protobuf spec:
NodeProto.domain= field 7 (string), standard domain is""or"ai.onnx", ONNX-ML extension is"ai.onnx.ml". - ORT dispatch:
IExecutionProvider::GetKernelRegistry()keyed on(op_type, domain, opset_version)— see ORT sourceonnxruntime/core/framework/op_kernel.h. - Prior scanner ADRs: ADR-0169 (Loop/If control-flow ops), ADR-0171 (Loop-count cap), ADR-0258 (Resize admitted).
- Security audit: parallel agent
wf_b08e0c22-717-7, 2026-06-07.