Skip to content

Research-0655: Saliency Feature Materializer

Question

How should existing AI feature tables receive real saliency signals without turning every trainer into an implicit video decoder and saliency runner?

Findings

  • Existing refreshed corpus tables are row-oriented JSONL/parquet assets, so saliency needs to be added as row-level aggregates before model selection can measure it.
  • vmaftune.saliency.compute_saliency_map already owns the model invocation and expects raw yuv420p plus explicit geometry.
  • Most local corpora are encoded video files, not raw YUV, so a reusable materializer must decode a bounded sample through FFmpeg and must probe geometry when table rows are incomplete.
  • A status column is more useful than fail-fast behaviour for large local sweeps: operators need to know which rows decoded, which were skipped, and which failed because the source or geometry was missing.

Result

Ship one table materializer under ai/scripts/ with JSONL/parquet I/O, bounded FFmpeg decode, ffprobe fallback, saliency_mean / saliency_var columns, and status values for operational triage.

Verification

ai/tests/test_materialize_saliency_features.py covers JSONL roundtrip, existing-column skips, missing-source status, ffprobe geometry fallback, and the fake FFmpeg-to-saliency handoff.