ADR-0654: Predictor Preserves Saliency Signals¶

Status: Accepted
Date: 2026-05-20
Deciders: Lusoris, Codex
Tags: ai, vmaf-tune, saliency, predictor

Context¶

The per-shot predictor input contract already reserves five perceptual signal slots beyond bitrate and geometry: saliency_mean, saliency_var, frame_diff_mean, y_avg, and y_var. The runtime extractor advertised --use-saliency, but its saliency hook called the ROI saliency helper with a stale signature, so the optional branch silently returned zeros. The trainer also zero-filled the same columns even when a refreshed corpus row already carried those values.

That left the signal-mix audit's saliency / ROI gap open despite having trained saliency weights and predictor columns in tree.

Decision¶

vmaf-tune predict --use-saliency will decode the current shot range to temporary yuv420p, run the configured saliency_student ONNX through the existing vmaftune.saliency.compute_saliency_map(...) helper, and feed the resulting mean and variance into ShotFeatures. Predictor training will preserve row-provided probe-byte, saliency, and signalstats columns whenever present, falling back to the historical bitrate stand-ins and zeros only for legacy rows.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Keep zero-filling saliency and signalstats in trainer rows	Maximum backward compatibility and no retraining drift from richer rows	Discards the exact signals the predictor input schema already reserved, so refreshed corpora cannot close the saliency gap	Rejected because it preserves the bug-shaped blind spot.
Add a new wider predictor input schema	Cleanly distinguishes old and rich corpora	Invalidates every shipped predictor ONNX and sidecar vector layout	Too much blast radius for a bug fix; the existing 14-column layout already has the required slots.
Decode shot ranges before saliency inference	Works for any FFmpeg-readable `predict --source` and reuses the proven raw-YUV saliency helper	Adds an optional ffmpeg decode per sampled shot when `--use-saliency` is enabled	Chosen because the CLI advertises container inputs and the helper expects raw `yuv420p`.

Consequences¶

Positive: Runtime --use-saliency now contributes real saliency features instead of silently falling back to zeros, and real predictor corpora can train from preserved saliency / signalstats columns.
Negative: predict --use-saliency pays a temporary raw decode and saliency pass per sampled shot. The flag remains opt-in.
Neutral / follow-ups: Future corpus emitters can materialise the same columns directly so predictor training does not need to recompute saliency during model refresh.

References¶

Research-0091 identified the predictor saliency slots as wired but underused.
ADR-0286 records the saliency-student model contract.
Source: req — "well that means do u2netp (experimental)... we can only learn i guess lol"
Source: req — "well that sounds like a lot to do... go on then"