Skip to content

ADR-0430: Saliency RGB ingest and SSIMULACRA2 public docs

  • Status: Accepted
  • Date: 2026-05-14
  • Deciders: Lusoris, Codex
  • Tags: vmaf-tune, saliency, docs, metrics, fork-local

Context

The public-doc gap scan found two user-facing stale surfaces: the SSIMULACRA2 metric page still called itself a stub even though scalar, SIMD, and GPU implementations are in tree, and the vmaf-tune saliency section still documented luma-replicated RGB as a deferred limitation.

The saliency student was trained for ImageNet-normalised RGB. Feeding luma replicated into all channels is a defensible smoke path, but it discards available chroma from the yuv420p source. The existing saliency pipeline already accepts yuv420p input and has a NumPy preprocessing step, so the implementation cost of nearest-neighbour chroma upsample plus BT.709 limited-range conversion is small.

Decision

vmaf-tune saliency inference will read full yuv420p frames, upsample U/V to luma resolution, convert BT.709 limited-range YUV to RGB, and then apply the existing ImageNet normalisation before calling saliency_student_v1. The SSIMULACRA2 public metric page will become an operator reference rather than a stub index page.

Alternatives considered

Option Pros Cons Why not chosen
Keep luma-replicated RGB Fastest and already tested Leaves a documented deferred limitation in a user-facing saliency path; ignores colour cues the model can consume Rejected
Convert yuv420p to RGB in the saliency preprocessor Closes the deferred limitation; preserves the existing model contract; easy to test without ffmpeg Slightly more CPU per sampled frame; chroma upsample remains nearest-neighbour Chosen
Shell out to ffmpeg for RGB frames Delegates colour conversion to a mature implementation Adds a subprocess dependency to the hot saliency path and complicates tests Rejected
Leave docs/metrics/ssimulacra2.md as a stub index No code/docs churn Contradicts the shipped implementation status and the doc sweep heuristic Rejected

Consequences

  • Positive: Saliency ROI can use colour information from source clips, and users now get a direct SSIMULACRA2 reference page with invocation, output, input formats, backends, and limitations.
  • Negative: Saliency preprocessing does a small amount of extra NumPy work per sampled frame.
  • Neutral / follow-ups: The saliency path still documents aggregate per-clip masks and nearest-neighbour chroma upsampling. Per-frame ROI remains separate future work.

References

  • ADR-0293
  • ADR-0130
  • ADR-0164
  • req: "when i look at the human facing docs we only need to search for (stub) or stub and for \"limitations\" or \"deferred\" to find the next tasks lol (perhaps we can combine some of them to make it a few less pr's lol)"