ADR-0550: Auto-resize input plane to NR tiny-model dims + `--tiny-resize` flag¶

Status: Accepted
Date: 2026-05-18
Deciders: lusoris, Claude
Tags: ai, cli, dnn, api

Context¶

The fork's NR tiny-model dispatch (vmaf_ctx_dnn_run_frame_nchw in core/src/libvmaf.c) is the per-frame handler that feeds the ONNX Runtime session for image-input tiny models (rank-4 NCHW). The shipped NR scorer model/tiny/nr_metric_v1.onnx has a fixed input shape of [1, 1, 224, 224] (KoNViD-1k middle-frame MobileNet trained at 224x224 grayscale; see model/tiny/nr_metric_v1.json:notes).

Before this ADR the dispatch hard-errored with -ERANGE on any dimension mismatch — including the common case of running a YUV fixture (e.g. src01_hrc01_576x324.yuv at 576x324) through the 224x224 model. Because the error bubbled up via vmaf_read_pictures → the CLI loop's break, the user saw 0 frames scored, an empty frames array in the JSON output, and the silent "0 frames" footer instead of any actionable feedback. The post-fix probe (Finding 11) flagged this as the next user-facing blocker after fix/auto-select-engages-cuda (commit e5d26e238) unlocked CUDA auto-select.

The legacy contract was symmetric with the FR (full-reference) tiny models, where the user supplies a matched pair and the kernel operates on bit-exact native dims. NR scoring breaks that assumption: there is no reference to match against, and the model's training resolution is an internal property of the checkpoint (visible only in the sidecar notes string), not a public part of the CLI surface. Forcing every NR user to pre-scale their YUV to the exact model dims is unrealistic — KoNViD-1k inputs vary, the training resolution differs per checkpoint, and the failure mode is silent.

Decision¶

We make the per-frame NCHW dispatch auto-resize the luma plane to the model's static input shape when the dims differ, using a deterministic separable filter selected at runtime. The default is DISABLED: a dimension mismatch returns -ERANGE (the pre-ADR-0550 hard-error), preserving the strict mode for parity harnesses and avoiding a silent free parameter. The operator must explicitly pass --tiny-resize {bilinear,nearest,bicubic} to enable auto-resize; filter choice must be documented alongside the model checkpoint because bilinear, nearest, and bicubic produce scores that differ by approximately 2% on the same input.

The three supported filters are bilinear (matching torchvision Resize(..., antialias=False) and OpenCV INTER_LINEAR), nearest (floor-coord, cheapest), and bicubic (Catmull-Rom a=-0.5, for parity with transforms.Resize(interpolation=BICUBIC) exporters).

When the source dims already equal the model dims, the resize helper forwards verbatim to vmaf_tensor_from_luma, keeping the matched-dims path bit-identical regardless of the selected filter.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
DISABLED default, explicit opt-in (chosen)	Preserves the pre-ADR-0550 strict mode for parity harnesses. No silent free parameter: the operator must name the filter and can document it alongside the model checkpoint. Zero-init struct field = DISABLED, so no runtime initialization overhead.	Requires an explicit `--tiny-resize bilinear` for the most common NR workflow; first-time users see -ERANGE until they read the docs.	Lowest-footgun approach: the ~2% score spread across filters means auto-selecting bilinear is a silent bias that downstream consumers cannot detect. DISABLED-as-default makes the choice explicit and auditable.
Auto-resize, default bilinear	Removes the silent-zero-frames footgun in the most common NR workflow. Matches torchvision / OpenCV training-time convention.	Filter choice is a silent free parameter producing ~2% score spread on the same input. A pipeline that forgets to set `--tiny-resize` gets bilinear silently; a model trained against bicubic will see a small but systematic bias.	Rejected per user direction: bilinear/nearest/bicubic producing ~2% spread is a footgun. DISABLED forces the operator to choose consciously.
Fail fast with a more informative diagnostic	Zero behavioural change; easy to ship; surfaces the mismatch loudly.	The user still has to pre-scale the YUV externally (ffmpeg `scale=224:224` etc.). Doubles the disk + decode cost. Re-implements at the user level what every PyTorch dataloader does in 1 line. The diagnostic does not actually unblock any workflow.	Rejected: makes the tool usable by experts only.
Require matched YUV (refuse to attach the model unless dims match)	Fails at model-load time, not at frame 0 — the diagnostic is even louder.	Still requires the user to pre-scale every NR input. Breaks the FR path's symmetry.	Rejected: same usability ceiling as the diagnostic option with extra rigidity.
Insert a `Resize` op into the ONNX graph at model-load time	The model's static input shape becomes dynamic for free; no C-side resize needed.	Requires a full ONNX graph mutation pass on every load. The fork's `op_allowlist.c` already gates `Resize` (ADR-0258); a runtime-injected op opens an attack surface. Untenable for `.int8.onnx` quantised siblings.	Rejected: solves the same problem at much higher implementation cost and ecosystem risk.
Run the resize inside ONNX Runtime via a `SessionOptions` graph transformer	Native to ORT; no C-side maintenance burden.	The `SessionOptions::AddTransformer` C API is internal-only and not part of the stable ORT contract. The transformer would still have to know the source dims at first-frame time, not at model-load time.	Rejected: tighter coupling to a non-stable ORT surface for no functional gain.

Consequences¶

Positive:
vmaf --no-reference --tiny-model nr_metric_v1.onnx --tiny-resize bilinear works out-of-the-box against any YUV resolution.
The default (DISABLED) makes the filter choice explicit and auditable; no silent score bias from an auto-selected filter.
The matched-dims fast path is bit-identical to the legacy code (forwards verbatim to vmaf_tensor_from_luma), so the Netflix golden gate is unaffected.
Public C API gains a small, focused setter (vmaf_dnn_set_resize_mode) instead of growing VmafDnnConfig with another flag — keeps the surface factored.
Negative:
One more knob to document and version. The CLI grammar gains a new --tiny-resize keyword set; any downstream wrapper that pre-parses CLI flags must accept it.
First-time users see -ERANGE until they add --tiny-resize bilinear (or equivalent). The error message must be actionable.
Neutral / follow-ups:
A future tiny-model sidecar field (expected_resize: "bilinear" | "bicubic" | ...) could drive the default per-model; until that exists, the CLI flag + global default is the contract.
docs/ai/inference.md gains an ## Auto-resize for image tiny models section documenting DISABLED-as-default and the ~2% spread warning.

References¶

req: post-fix probe Finding 11 — "Fix the NR-model input-size mismatch in VMAFx/vmafx surfaced by the post-fix probe …" (paraphrased: NR-path dispatch returns -ERANGE when the model's expected spatial dims don't match the YUV; the loop breaks at frame 0 -> 0 frames scored).
ADR-0524 — symbolic batch dim acceptance in vmaf_ctx_dnn_attach (shares the per-frame NCHW dispatch).
ADR-0258 — Resize op admitted to the ONNX allowlist (used by U-2-Net / mobilesal / saliency models that resize internally).
ADR-0042 — tiny-AI docs required per PR (this ADR ships docs/ai/inference.md §Auto-resize).
core/src/dnn/tensor_io.c::vmaf_tensor_from_luma_resize (this PR).
core/src/libvmaf.c::vmaf_ctx_dnn_run_frame_nchw (this PR wires the resize through).
Source: req (post-fix probe Finding 11, 2026-05-18).

ADR-0550: Auto-resize input plane to NR tiny-model dims + --tiny-resize flag¶