`vmaf-tune --saliency-aware`¶

vmaf-tune recommend-saliency runs a single saliency-aware encode. It materialises a saliency sidecar from the shipped saliency model, translates that sidecar into codec-specific ROI/QP controls, and then dispatches the encode through the normal codec-adapter path.

The implementation lives in tools/vmaf-tune/src/vmaftune/saliency.py and is wired through tools/vmaf-tune/src/vmaftune/cli.py.

Quick Start¶

vmaf-tune recommend-saliency \
    --src ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --encoder libx264 \
    --preset medium --crf 23 \
    --saliency-offset -3 \
    --output roi.mp4

What Happens¶

load_saliency_sidecar() loads an existing sidecar or runs the saliency model to create one.
build_roi_plan() converts frame-level saliency into encoder ROI controls.
run_saliency_encode() dispatches the codec-specific encode.
Unsupported ROI encoders exit with code 2 and a structured error message by default; pass --saliency-fallback-plain to accept a plain encode instead (ADR-0546).

Supported saliency ROI encoders are:

Encoder	ROI channel
`libx264`	`-x264-params qpfile=...`
`libaom-av1`	patched FFmpeg `-qpfile ...` bridge
`libx265`	`-x265-params zones=...`
`libsvtav1`	`-svtav1-params qp-file=...`
`libvvenc`	`-vvenc-params ROIFile=...`

The shipped default model is documented in saliency_student_v1.md.

Flags¶

Flag	Default	Notes
`--src PATH`	—	Source clip.
`--width / --height`	—	Source geometry.
`--pix-fmt`	`yuv420p`	Source pixel format.
`--framerate`	`24.0`	Source framerate.
`--duration`	`0.0`	Source duration.
`--encoder`	`libx264`	Codec adapter.
`--preset`	`medium`	Codec preset.
`--crf`	`23`	Base quality before ROI offsets.
`--saliency-offset`	`-3`	QP/quality offset applied to salient regions.
`--saliency-model PATH`	shipped model	Override saliency ONNX path.
`--saliency-aggregator`	`mean`	Temporal reducer for sampled per-frame saliency masks. One of `mean`, `ema`, `max`, `motion-weighted`. See Temporal aggregation below.
`--saliency-ema-alpha`	`0.6`	Current-frame weight when `--saliency-aggregator=ema`. Range `(0, 1]`; higher values weight recent frames more heavily.
`--saliency-fallback-plain`	off	ADR-0546: when the chosen encoder has no ROI dispatch, accept a plain encode instead of exiting with code 2. An ERROR is logged. Equivalent to setting `VMAFTUNE_SALIENCY_FALLBACK_OK=1`. Supported ROI encoders: `libx264`, `libaom-av1`, `libx265`, `libsvtav1`, `libvvenc`.
`--ffmpeg-bin`	`ffmpeg`	FFmpeg binary.
`--output PATH`	—	Encoded output.

Temporal aggregation¶

--saliency-aggregator controls how the per-frame saliency masks produced by saliency_student_v1 are reduced to the single ROI pattern applied to the encode pass.

Aggregator	Behaviour	Use when
`mean`	Per-pixel arithmetic mean across all sampled frames. Preserves the historical implementation.	Default, stable clips, and baseline comparisons.
`ema`	Exponential moving average; `--saliency-ema-alpha` is the weight of the current frame. Older frames decay geometrically.	Clips with scene changes or motion bursts where the most-recent frames dominate the salient region.
`max`	Per-pixel maximum over all sampled masks.	Missing a briefly salient object is worse than over-protecting background; conservative choice for sports or highlight reels.
`motion-weighted`	Weighted mean where each sampled frame is weighted by its luma delta from the previous sampled frame. Still frames contribute less than high-motion frames.	Motion-heavy clips where foreground objects define the perceptually important regions.

All four reducers use the same saliency_student_v1 ONNX weights and the same downstream QP-offset mapping, so changing the aggregator does not change the model contract or the encoder sidecar format. The default (mean) matches pre-ADR-0396 behaviour and is suitable for most clips.

vmaf-tune --saliency-aware¶

Quick Start¶

What Happens¶

Flags¶

Temporal aggregation¶

See Also¶

`vmaf-tune --saliency-aware`¶