DNN session API — libvmaf/dnn.h¶
The DNN surface in core/include/libvmaf/dnn.h lets callers load and run tiny ONNX models from C, either attached to a VmafContext (so DNN scores show up next to SVM scores in the normal VMAF report) or as a standalone session (luma-in / luma-out filter-style inference, no VmafContext required).
This is the runtime half of the tiny-AI surface; training lives in ai/. See ADR-0022 (ORT as the inference runtime) and ADR-0023 (the four user surfaces: CLI, C API, ffmpeg, training).
Availability check¶
Returns 1 if libvmaf was built with -Denable_dnn=enabled and ONNX Runtime is linked, 0 otherwise. When 0, every other entry point in dnn.h returns -ENOSYS. This is the cheap way to branch between DNN and classic-only build configs at runtime without wrapping every call in its own check.
Device config — VmafDnnConfig¶
typedef enum VmafDnnDevice {
VMAF_DNN_DEVICE_AUTO = 0,
VMAF_DNN_DEVICE_CPU = 1,
VMAF_DNN_DEVICE_CUDA = 2,
VMAF_DNN_DEVICE_OPENVINO = 3, /* OpenVINO GPU with CPU fallback */
VMAF_DNN_DEVICE_ROCM = 4,
VMAF_DNN_DEVICE_COREML = 5,
VMAF_DNN_DEVICE_COREML_ANE = 6,
VMAF_DNN_DEVICE_COREML_GPU = 7,
VMAF_DNN_DEVICE_COREML_CPU = 8,
VMAF_DNN_DEVICE_OPENVINO_NPU = 9,
VMAF_DNN_DEVICE_OPENVINO_CPU = 10,
VMAF_DNN_DEVICE_OPENVINO_GPU = 11,
} VmafDnnDevice;
typedef struct VmafDnnConfig {
VmafDnnDevice device;
int device_index; /* multi-GPU index; 0 for single-GPU/CPU */
int threads; /* CPU EP intra-op threads; 0 = ORT default */
bool fp16_io; /* request fp16 tensors when supported */
} VmafDnnConfig;
device = AUTO— tries CUDA, OpenVINO GPU, ROCm, CoreML, then CPU. OpenVINO NPU is intentionally explicit-only because small graphs can pay a noticeable NPU power-state latency floor.device = CPU— forces ORT's CPU execution provider.device = CUDA— triesCUDAExecutionProviderwhen the linked ORT build exports it. If CUDA EP append fails, the session falls back to CPU and still opens.device = OPENVINO— tries OpenVINOdevice_type=GPU, thendevice_type=CPU.device = OPENVINO_NPU/_CPU/_GPU— pins the OpenVINO EP to a singledevice_type(NPU,CPU, orGPU). Missing EP support or absent silicon still degrades to CPU through the common fallback.device = ROCM— triesROCMExecutionProvider, then falls back to CPU.device = COREML/_ANE/_GPU/_CPU— triesCoreMLExecutionProvider. The base selector lets CoreML choose compute units; the variants setMLComputeUnitstoCPUAndNeuralEngine,CPUAndGPU, orCPUOnly. Non-Apple ORT builds fall back to CPU.fp16_io— enables fp32-to-fp16 staging for model slots declared asFLOAT16. OpenVINO also receives theprecision=FP16EP option.threads = 0— lets ORT pick. Set explicitly when pinning affinity or benchmarking.
Pass NULL for cfg in any function that accepts one to use VMAF_DNN_DEVICE_AUTO with zero device index, zero threads, and no fp16 I/O.
Attached mode — vmaf_use_tiny_model¶
Register a tiny ONNX model on a live VmafContext. The model participates in the per-frame pipeline; its outputs appear in the report alongside SVM scores. Use this when you want "VMAF + tiny AI score" in the same run.
Returns:
0— success.-ENOSYS— built without DNN support.-EINVAL— bad args (nullctxoronnx_path).-ENOENT—onnx_pathdoes not exist or is not a regular file.-E2BIG— file exceeds the compile-time 50 MB cap (VMAF_DNN_DEFAULT_MAX_BYTES— defence against adversarial bloat; see ADR-0039). The historicalVMAF_MAX_MODEL_BYTESenv override was retired in T7-12.-ENOMEM— session allocation failed (ORT env, session options, or internal buffer allocation).- Negative
errnofrom the operator-allowlist walk if the model contains a banned op.
Equivalent CLI flag: --tiny-model <path> (usage/cli.md).
Attached scores are written to the normal feature collector:
- A single-output model preserves the historical key: the sidecar
namefield (orvmaf_tiny_modelwhen no sidecar name exists). - A multi-output model emits one scalar score per graph output. The key is
<sidecar-name>_<output-name>, whereoutput-namecomes from sidecaroutput_names[]when the array length matches the ONNX output count, or from the ONNX graph output name otherwise. The suffix is sanitized to[A-Za-z0-9_]; duplicate sanitized suffixes fall back to deterministicoutput<slot>_<attempt>keys.
Codec-aware tiny-model inputs — vmaf_dnn_set_codec_context¶
Codec-conditioned tiny models (e.g. the v2 ladder regressor) accept a small categorical block alongside the per-frame features: encoder identity, preset ordinal, and CRF / QP. vmaf_dnn_set_codec_context populates that block on the attached tiny model so the loop body does not need to re-supply it per frame.
int vmaf_dnn_set_codec_context(VmafContext *ctx,
const char *codec_name,
const char *preset,
int crf);
Parameters¶
| Parameter | Notes |
|---|---|
ctx | Context with a tiny model already attached via vmaf_use_tiny_model(). |
codec_name | Encoder name (libx264, libx265, libsvtav1, libvpx-vp9, h264_nvenc, ...). NULL or "" selects the "unknown" bucket. |
preset | Preset string (medium, slow, p4, 5, ...). NULL defaults to ordinal 5 (mid-tier). |
crf | CRF / QP integer; clamped to [0, 63]. |
Returns¶
| Code | Meaning |
|---|---|
0 | Codec block written, or the model accepted the "unknown" bucket. |
-ENOENT | codec_name is non-NULL but not in the model's encoder_vocab; the "unknown" bucket was used. |
-ENOSYS | libvmaf was built without DNN support (-Denable_dnn=disabled). |
-EINVAL | ctx is NULL or no tiny model is attached. |
-ENOTSUP | The attached model has no codec block (rank-4 image model or rank-2 single-input model). |
Equivalent CLI flags (the vmaf CLI calls this internally): --tiny-codec, --tiny-preset, --tiny-crf — see usage/cli.md. Per-codec / per-preset vocabularies live in the model's sidecar JSON under encoder_vocab and preset_vocab; the loader bakes them into the runtime descriptor at vmaf_use_tiny_model() time.
Tiny-model auto-resize — vmaf_dnn_set_resize_mode¶
NCHW tiny models declare a fixed input shape at training time (e.g. 224×224 for the nr_metric_v1 NR scorer). When the user-supplied frame dims don't match, the per-frame dispatch resamples the luma plane to the model dims using the selected filter before invoking ONNX Runtime. Bit-exact when source dims already equal model dims (the routine forwards to vmaf_tensor_from_luma unchanged). Introduced in ADR-0550.
typedef enum VmafDnnResizeMode {
VMAF_DNN_RESIZE_DISABLED = 0, /* default; mismatch → -ERANGE */
VMAF_DNN_RESIZE_BILINEAR = 1, /* OpenCV INTER_LINEAR / torchvision */
VMAF_DNN_RESIZE_NEAREST = 2, /* nearest, floor coord */
VMAF_DNN_RESIZE_BICUBIC = 3, /* Catmull-Rom (a = -0.5) */
} VmafDnnResizeMode;
int vmaf_dnn_set_resize_mode(VmafContext *ctx, VmafDnnResizeMode mode);
Filter semantics¶
| Mode | Equivalent | When to use |
|---|---|---|
DISABLED | None — size mismatch returns -ERANGE | Parity harnesses; strict-mode pipelines (default). |
BILINEAR | torchvision Resize(..., antialias=False) / OpenCV INTER_LINEAR | Every shipped NR / image-input tiny-AI model was trained against this. |
NEAREST | OpenCV INTER_NEAREST; deterministic floor of source coord | Cheaper; debugging dispatch without a filter parameter. |
BICUBIC | Separable Catmull-Rom (a = -0.5); torchvision BICUBIC | Parity with exporters that used transforms.Resize(interpolation=BICUBIC). |
The three filter modes produce scores that differ by approximately 2% on the same input — treat filter choice as a model hyperparameter and document it alongside the model checkpoint.
Resize-mode returns¶
| Code | Meaning |
|---|---|
0 | Resize mode updated; takes effect on the next vmaf_read_pictures(). |
-EINVAL | ctx is NULL, or mode is outside the enum range. |
-ENOSYS | libvmaf was built without DNN support. |
Equivalent CLI flag: --tiny-resize <bilinear|nearest|bicubic|disabled> — see usage/cli.md. May be called before or after vmaf_use_tiny_model(); the setting is sticky for the lifetime of the context.
Standalone sessions — VmafDnnSession¶
Standalone mode is for filter-style inference that does not need a VmafContext — e.g. a learned de-banding preprocessor that mutates a luma plane before downstream processing.
typedef struct VmafDnnSession VmafDnnSession;
int vmaf_dnn_session_open (VmafDnnSession **out, const char *onnx_path, const VmafDnnConfig *cfg);
void vmaf_dnn_session_close(VmafDnnSession *sess);
Both *_session_open and vmaf_use_tiny_model apply the same size-cap + operator-allowlist walk. See ADR-0039 for the allowlist and ADR-0041 for an example of an extractor that uses a session under the hood.
Luma-only convenience call¶
int vmaf_dnn_session_run_luma8(VmafDnnSession *sess,
const uint8_t *in, size_t in_stride,
int w, int h,
uint8_t *out, size_t out_stride);
Runs one luma-in / luma-out pass. Only works when the ONNX graph has:
- exactly one float32 input of static shape
[1, 1, H, W], - exactly one output of the same shape.
The implementation:
- Reads
in(uint8 luma), normalises to[0, 1](applies mean/std from the sidecar JSON if present). - Runs ORT.
- De-normalises, rounds, clamps to
[0, 255], writesout.
Errors:
-EINVAL—sess,in, oroutis NULL.-ENOTSUP— graph shape isn't the supported NCHW[1,1,H,W]luma layout, or ORT returned fewer output elements thanw*h.-ERANGE—w/hdon't match the graph's static input shape. Usevmaf_dnn_session_run()for dynamic shapes.-ENOSYS— libvmaf was built without ONNX Runtime support (enable_onnx=false); every DNN entry point returns-ENOSYSin that configuration.
10/12/16-bit convenience call¶
int vmaf_dnn_session_run_plane16(VmafDnnSession *sess,
const uint16_t *in, size_t in_stride,
int w, int h, int bpc,
uint16_t *out, size_t out_stride);
The bit-depth-extended sibling of _luma8. Used by the ffmpeg vmaf_pre filter for yuv420p10le / yuv422p10le / yuv444p10le (and the 12-bit LE counterparts), and — at any supported bit depth — to run the same session on chroma planes at their sub-sampled dimensions. Added in ADR-0170 (T6-4).
in/outare packeduint16little-endian single-plane buffers.in_stride/out_strideare in bytes (not samples) — same convention as_luma8, so a 10-bit 1920×1080 plane hasstride ≥ 1920 * 2.bpcin range 9..16 selects the normalisation divisor(1 << bpc) - 1. Passingbpc=8returns-EINVAL— use_luma8for 8-bit input.
The model must still declare [1, 1, H, W] static shape; the only new freedom is the bit depth of the host-side buffer the loader normalises from. A single learned_filter_v1 session works for both luma and chroma — re-call with chroma W/H (the shape is declared dynamic, see the open() comment).
Errors match _luma8, plus -EINVAL for a bpc outside [9, 16].
General named-binding call¶
For models with multiple inputs / outputs or non-luma shapes, use the general call:
typedef struct VmafDnnInput {
const char *name; /* bind by graph name; NULL = positional */
const float *data; /* row-major float32 */
const int64_t *shape; /* rank dims */
size_t rank;
} VmafDnnInput;
typedef struct VmafDnnOutput {
const char *name; /* bind by graph name; NULL = positional */
float *data; /* caller-owned */
size_t capacity; /* element count allocated */
size_t written; /* OUT: element count produced */
} VmafDnnOutput;
int vmaf_dnn_session_run(VmafDnnSession *sess,
const VmafDnnInput *inputs, size_t n_inputs,
VmafDnnOutput *outputs, size_t n_outputs);
Name-binding (name != NULL) resolves by the ONNX graph's declared input / output names. Positional binding (name == NULL) uses the tensor's array index. Mix is allowed but discouraged — pick one style per session.
Errors:
-ENOSYS— built without DNN support.-EINVAL— mismatched arity, null pointers, rank zero.-ENOMEM— allocation failure (per-input staging buffer, or tensor creation).-ENOSPC— someoutputs[i].capacityis smaller than the produced tensor. On this return,outputs[i].writtenis populated with the required element count (the code setswritten = producedbefore the capacity check), so the caller can resize and retry with the same bindings.-EIO— ORT failure (bad graph, EP crash, OOM on device). The diagnostic is logged via theVmafContextlog callback if one is configured (for sessions opened without aVmafContext, logging goes through the library's global log sink).
See ADR-0040 for the rationale behind multi-input/output + named binding.
Thread-safety¶
- A single
VmafDnnSessionis not re-entrant. Driving inference from two threads requires either per-thread sessions or external locking. - Opening multiple sessions concurrently is safe; they do not share state beyond process-global ORT singletons.
- Attaching a tiny model via
vmaf_use_tiny_model()is subject to the same single-driver rule as the rest of theVmafContextAPI — see index.md.
Runnable example — standalone luma filter¶
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <libvmaf/dnn.h>
int main(int argc, char **argv)
{
if (!vmaf_dnn_available()) {
fprintf(stderr, "libvmaf was built without DNN support\n");
return 2;
}
VmafDnnSession *sess = NULL;
VmafDnnConfig cfg = { .device = VMAF_DNN_DEVICE_AUTO };
int err = vmaf_dnn_session_open(&sess, argv[1], &cfg);
if (err < 0) {
fprintf(stderr, "open failed: %d (%s)\n", err, strerror(-err));
return 1;
}
const int W = 1920, H = 1080;
uint8_t *in = malloc((size_t)W * H);
uint8_t *out = malloc((size_t)W * H);
/* ...fill `in` with luma from your pipeline... */
err = vmaf_dnn_session_run_luma8(sess, in, W, W, H, out, W);
if (err < 0) fprintf(stderr, "run failed: %d\n", err);
free(in); free(out);
vmaf_dnn_session_close(sess);
return err < 0 ? 1 : 0;
}
Build:
Only works when libvmaf was built with -Denable_dnn=enabled.
Sigstore signature verification — vmaf_dnn_verify_signature¶
The fork ships a Sigstore-keyless verification primitive for tiny-AI ONNX bundles, surfaced by both the C API (vmaf_dnn_verify_signature in core/include/libvmaf/dnn.h) and the CLI flag --tiny-model-verify on the vmaf binary (see docs/usage/cli.md).
Behaviour. When invoked with the path to an ONNX file and a path to its sibling Sigstore bundle (.sigstore or .sig/.cert pair), the function shells out to the system cosign binary in offline verification mode and returns 0 on a passing signature, a negative errno on failure, and the cosign stderr text via *err (caller frees).
Build / platform requirements.
- Requires
enable_dnn=enabledat meson configure (no-op onenable_dnn=disabledbuilds — returns-ENOSYS). - Requires
cosignon$PATHat runtime. The function looks upcosignviaposix_spawnp; absence is reported as a clearcosign not founderror message rather than silently passing. - Windows is not supported: the function returns
-ENOSYSunconditionally on Windows builds (core/src/dnn/model_loader.cshort-circuits on_WIN32). The Sigstore offline-verify path depends onposix_spawnpand a few sibling POSIX primitives that do not have a clean Windows equivalent in our build matrix.
CLI coupling. The --tiny-model-verify flag on the vmaf CLI sets a context-level boolean that triggers the verification call inside vmaf_use_tiny_model after the ONNX bundle path is resolved. A failing verification aborts model load with a clear error; passing verification logs an info-level confirmation including the cosign identity / issuer that was matched.
Provenance contract. The Sigstore bundle is produced by the fork's release-please / Sigstore signing pipeline (per ADR-0010 + the model-registry policy in docs/ai/model-registry.md). Bundles shipped under model/tiny/*.sig + *.cert carry a keyless OIDC identity tied to the GitHub Actions workflow that built the model.
Known limitations¶
- Attached mode supports multiple scalar output tensors, not vector or image output tensors.
vmaf_use_tiny_model()/vmaf_ctx_dnn_attach()can now record every ONNX output when each output tensor contains exactly one scalar value. If any attached output tensor has more than one element, the frame run returns-ENOTSUP. Use the standalonevmaf_dnn_session_run()API for caller-owned vector/image output buffers. See ADR-0646. -
Attached mode caps routed outputs at eight tensors. This mirrors
VMAF_ORT_MAX_IO, the existing ORT wrapper stack-array limit. Models with more outputs should collapse related values into a standalone session output or ship a future ADR that raises the cap across the ORT wrapper and sidecar parser together. -
Operator allowlist covers the set required by tiny FR / NR / filter models shipped in
model/tiny/; untrusted models with new op types will be rejected at_open. Extend the allowlist via the registry — see ADR-0039. - EP selection is a preference, not a hard requirement: when a requested provider is missing from the linked ORT build, session open falls back to CPU. Call
vmaf_dnn_session_attached_ep()and assert on the returned string if your application needs to fail on missing CUDA / OpenVINO / CoreML / ROCm. VmafDnnConfig.fp16_ioonly changes slots whose ONNX element type isFLOAT16; float32 model inputs and outputs stay float32. It is therefore harmless but not a speed switch for fp32-only graphs.- There is no callback / progress hook; inference is synchronous per call.
- Sessions are heap-only; no stack-allocated variant.
Related¶
- ADR-0022 — choice of ONNX Runtime.
- ADR-0023 — where the CLI / C API / ffmpeg / training surfaces intersect.
- ADR-0036 — Wave 1 scope (LPIPS, saliency, per-shot CRF,
vmaf_post, allowlistLoop/If, MCP VLM). - ADR-0039 — operator allowlist + model registry.
- ADR-0040 — multi-input/output named-binding API.
- ADR-0041 — LPIPS-SqueezeNet extractor (consumer of this API).
- ADR-0042 — tiny-AI doc specialisation.
- ../ai/inference.md — CLI-side tiny-AI walkthrough.