ADR-0425: vmaf-roi-score saliency materialiser¶

Status: Accepted
Date: 2026-05-14
Deciders: Lusoris
Tags: tooling, ai, saliency, vmaf, fork-local

Context¶

ADR-0296 shipped vmaf-roi-score as an Option C scaffold: run vmaf twice, blend the full-frame and saliency-masked scores, and defer the actual YUV mask materialiser until saliency_student_v1 was available. That dependency is now present in model/tiny/saliency_student_v1.onnx, and leaving --saliency-model as an exit-64 path makes the user-facing tool unusable for its main purpose.

The materialiser must stay out of libvmaf C-side pooling. Option A remains a separate future decision because it affects numerical correctness and cross-backend parity. This follow-up only fills the Python tool-level substitution path described by ADR-0296.

Decision¶

We will wire vmaf-roi-score --saliency-model to materialise a temporary 8-bit planar YUV file by running ONNX saliency inference on the reference frame, replacing low-saliency distorted pixels with reference pixels, and then scoring that temporary file with the existing vmaf CLI. The first implementation supports yuv420p, yuv422p, and yuv444p; higher-bit-depth formats remain rejected with a clear error.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Implement 8-bit YUV materialisation in the Python tool	Completes the documented Option C path; no libvmaf numerical drift; easy to test with injected masks	Limited to 8-bit planar YUV; costs a second `vmaf` run	Chosen as the smallest complete user-facing implementation
Extend directly to 10/12/16-bit YUV	Covers HDR and high-bit-depth workflows immediately	Requires separate plane-width handling and more fixtures; larger failure surface	Deferred so the first usable path stays small and auditable
Implement Option A in libvmaf pooling	More mathematically direct and one-pass	Touches numerical core, model semantics, and cross-backend parity	Out of scope for this follow-up; remains future ADR work
Keep `--saliency-model` scaffolded	No new runtime dependency pressure	User-visible CLI remains mostly unusable	Rejected because `saliency_student_v1` now unblocks the path

Consequences¶

Positive: vmaf-roi-score --saliency-model now produces a real masked YUV and score instead of returning exit 64.
Positive: the implementation keeps ONNX Runtime lazy-loaded, so synthetic-mask smoke tests do not require runtime extras.
Negative: 10/12/16-bit YUV users still need the ordinary full-frame path or a future materialiser extension.
Neutral / follow-ups: Option A, true per-pixel weighted pooling inside libvmaf, remains deferred and must not be folded into this Python tool follow-up.

References¶

ADR-0296
ADR-0286
Research-0069
Source: req ("just do one")