ADR-0414: Saliency-aware ROI for x265 / SVT-AV1 / libvvenc adapters¶
- Status: Accepted
- Date: 2026-05-09
- Deciders: lusoris, Claude (Anthropic)
- Tags:
vmaf-tune,saliency,codec-adapter,roi,fork-local
Context¶
The fork's x264 adapter already delivers per-frame QP-offset maps via the saliency.py pipeline (DUTS-trained U-Net, saliency_student_v1.onnx). x264 consumes the map through its ASCII --qpfile mechanism (ADR-0293). The three other software encoders in the Phase A adapter set — libx265, libsvtav1, and libvvenc — each expose a distinct ROI mechanism:
- libx265:
--zones=startfrm,endfrm,q=<delta>passed through-x265-params zones=…(the same KV channel the two-passpass=Nflag uses). - SVT-AV1:
--qp-file(v1.7+) accepts a plain-text file of space-separated QP delta rows at 64×64 super-block granularity, passed through-svtav1-params qp-file=…. - libvvenc:
--roi/ROIFileaccepts a CSV of comma-separated QP delta rows at 64×64 CTU granularity, passed through-vvenc-params ROIFile=….
PR #456 attempted to add this support but was closed because it predated the F.3/F.5/conformal-intervals surface and conflicted with --two-pass / --with-uncertainty in the CLI. This ADR re-ports the feature cleanly on top of current master without touching the conformal surface.
Decision¶
We will extend vmaftune.saliency with three new pure-function formatters:
write_x265_zones_arg— reduces the per-block offset map to a single spatial mean and emits an x265--zonesstring covering the full clip duration.write_svtav1_qpoffset_map— writes a space-separated QP-offset file at 64×64 super-block granularity (one frame per frame, blank-line separator).write_vvenc_roi_csv— writes a comma-separated ROI-delta CSV at 64×64 CTU granularity (same frame layout, comma delimiter instead of space).
Corresponding augment_extra_params_with_* helpers follow the same pattern as the existing x264 helper so callers stay symmetric.
Each adapter gains a supports_saliency_roi: bool = True field and a delegating method (zones_from_saliency, qpmap_from_saliency, roi_from_saliency) that wires through to the formatter. The x265 adapter's existing supports_two_pass: bool = True and supports_qpfile: bool = False fields are unchanged — zones-based saliency ROI is orthogonal to 2-pass.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| x264 qpfile format for all three codecs | Simpler — one formatter | libx265 / SVT-AV1 / vvenc do not honour x264's ASCII qpfile syntax | Rejected: wrong format per encoder docs |
| Per-MB (16×16) block granularity for SVT-AV1 / vvenc | Matches x264 MB granularity | SVT-AV1 super-block is 64×64; vvenc CTU is 64×64; sub-SB granularity is undefined | Rejected: encoders document 64×64 as the ROI-map unit |
| Single-zone x265 approach (mean offset, all frames) | Simple; matches per-clip aggregate saliency mask semantics | Loses within-clip spatial variation | Chosen: per-clip aggregate is the established posture (ADR-0293); temporal zones are a follow-up |
Reuse saliency_aware_encode to dispatch per-codec | One call site | Would require branching on encoder identity inside the helper | Rejected: ADR-0294 forbids codec-identity branches in the encode driver |
Consequences¶
- Positive: x265, SVT-AV1, and VVenC encodes can consume the DUTS-trained saliency map via their native ROI channel with no behaviour change to existing x264 / two-pass / conformal-intervals paths.
- Negative: x265 zones deliver a single spatial-mean delta per clip (not true per-block granularity) because x265's
--zoneshas temporal granularity only. Per-block spatial fidelity requires a future x265 qpfile port. - Neutral / follow-ups:
- The SVT-AV1 qpmap granularity is 64×64; the saliency map is typically 720p/1080p, so the block grid can be small (e.g. 12×17 at 720p). This is expected and accepted.
- Temporal per-zone x265 support (multiple
startfrm,endfrmwindows) is deferred to a follow-up PR. write_svtav1_qpoffset_mapandwrite_vvenc_roi_csvreplicate the same spatial mask across all frames because the saliency pipeline produces a per-clip aggregate. Frame-level temporal saliency is a separate effort.
References¶
- ADR-0293 (x264 saliency-aware ROI — parent design)
- ADR-0286 (
saliency_student_v1.onnxmodel) - ADR-0247 (
vmaf-roiQP-offset convention) - ADR-0333 (Phase F two-pass — unchanged by this ADR)
- SVT-AV1
Source/App/EncApp/EbAppConfig.cread_qp_map_file, tagv2.1.0 - VVenC
VVEncAppCfg.hROI_FILEkey, tagv1.14.0(SHA9428ea86) - libx265 encoder options table —
zoneskey, x265 docs 3.x - Re-port of closed PR #456 (design review comment cited)