`vmaf-tune` — quality-aware encode automation harness¶

vmaf-tune is a fork-added Python tool (ADR-0237, Research-0044) that drives FFmpeg over an encoder parameter grid, scores each encode with vmaf, and emits a JSONL corpus of (source, encoder, params, bitrate, vmaf) rows.

Subcommands at a glance¶

Subcommand	Purpose	ADR / research
`corpus`	Multi-codec encoder grid sweep + scoring	ADR-0237
`recommend`	Target-VMAF / target-bitrate predicate (Phase B)	Research-0061, buckets 4+5
`tune-per-shot`	Per-shot CRF zones	ADR-0392
`recommend-saliency`	Saliency-aware ROI tuning	ADR-0287 ecosystem; consumes `vmaf-roi` sidecars
`ladder`	Per-title bitrate ladder (Pareto ABR)	ADR-0295
`fast`	Predicted-CRF fast path	ADR-0276
`prefilter`	Pelorus deband strengths + CRF joint autotune	ADR-1116
`hdr`	HDR-aware tuning + HDR-VMAF scoring	(PR #434, bucket #9)
`compare`	Apples-to-apples codec comparison at matched VMAF	(PR #435)
`benchmark`	Offline cross-codec report from an existing JSONL	ADR-0424
`sidecar`	Local predictor bias-correction training / inspection	ADR-0394
`encode-profile`	Encode one recommendation from a report profile	ADR-0643

Codec adapters¶

19 adapters under tools/vmaf-tune/src/vmaftune/codec_adapters/:

Family	Software	NVIDIA NVENC	Intel QSV	AMD AMF	Apple VideoToolbox
AV1	`libaom-av1`, `libsvtav1`	`av1_nvenc` (ADR-0290)	`av1_qsv`	`av1_amf`	`av1_videotoolbox` placeholder
H.264	`x264` (Phase A)	`h264_nvenc`	`h264_qsv`	`h264_amf`	`h264_videotoolbox`
HEVC	`x265` (ADR-0288)	`hevc_nvenc`	`hevc_qsv`	`hevc_amf`	`hevc_videotoolbox`
VP9	`libvpx-vp9`	—	—	—	—
VVC	`vvenc`	—	—	—	—

Pipeline¶

ref.yuv ──► vmaf-tune corpus ──► encode (libx264) ──► vmaf score ──► corpus.jsonl
ref.yuv ──► vmaf-tune corpus ──► encode (libx264|libx265) ──► vmaf score ──► corpus.jsonl
              │
              └─► encodes are written to --encode-dir, deleted post-score
                  unless --keep-encodes

corpus.jsonl ──► vmaf-tune recommend --target-vmaf T   ──► smallest CRF >= T
                                    --target-bitrate B ──► CRF closest to B
corpus.jsonl ──► vmaf-tune benchmark --target-vmaf T   ──► encoder ranking

JSON artifact portability¶

Human-facing vmaf-tune CLI JSON outputs, report-style artifacts, Phase-F executor result JSONL (tune_results*.jsonl), and local sidecar state files are strict RFC-8259 JSON. Diagnostic values that are non-finite in memory (NaN, Infinity, -Infinity) are serialized as null rather than as JavaScript-only tokens, so notebooks, dashboards, FFmpeg profile consumers, and MCP clients can parse the files with strict JSON decoders. Corpus JSONL rows remain the training interchange format; their feature-missing semantics are documented separately in the corpus schema below.

Environment variables¶

Variable	Default	Notes
`VMAFTUNE_WORKDIR`	(OS default, typically `/tmp`)	Parent directory under which all vmaf-tune subcommands create their per-run temporary scratch directories (decoded reference YUV, intermediate encodes). Set this to a path on a volume with sufficient free space when the OS `/tmp` is a small `tmpfs` — e.g. a 634-second 1080p60 source decodes to approximately 118 GB of raw YUV420p. The `compare`, `tune-per-shot`, and `ladder` subcommands also accept `--workdir PATH` to override this variable per invocation. In the `vmaf-dev-mcp` container this is pre-set to `/probes/vmaftune-work` (the 435 GB `/probes` bind-mount). Resolution order: `--workdir` flag > `VMAFTUNE_WORKDIR` env var > OS default. (ADR-0598)

Workdir disk-space management (ADR-0577)¶

compare and tune-per-shot both run bisects inside a thread pool. Without a concurrency cap, each codec thread decodes the full reference source in parallel — a 3-codec compare against a 110 GB BBB 1080p source would peak at 330 GB, which overflowed the 420 GB /probes volume in the BBB v13 failure.

Three safeguards are active by default:

Decode concurrency cap (--max-concurrent-decodes 1). A semaphore serialises reference-YUV decode operations across all threads. Peak disk stays at one YUV at a time (110 GB for BBB) regardless of how many codecs run in parallel. Raise the cap on hosts with large volumes and fast disks.
Aggressive cleanup. After each bisect (codec × target VMAF pair) completes, its decoded reference YUV is deleted immediately. Per-iteration .mkv and .decoded.yuv files are deleted as soon as scoring finishes. Scratch accumulation is therefore bounded to the working set of the currently active bisect, not the full run.
Mid-run disk-space check. Before each iteration's decode, shutil.disk_usage is called and compared against 2× the estimated YUV size. If the volume is too full the bisect fails fast with a structured error message that names the codec and target VMAF, rather than an opaque ffmpeg rc=228 (ENOSPC) followed by a corrupt output JSON.

To reproduce the BBB v13 scenario in tests, monkeypatch shutil.disk_usage to return low free space and pass a container source — the preflight and mid-run checks both fire before any real decode is attempted.

Install¶

The tool ships under tools/vmaf-tune/ as a standalone Python package. Phase A has zero runtime dependencies beyond the standard library.

pip install -e tools/vmaf-tune
# or run directly from the checkout:
python tools/vmaf-tune/vmaf-tune --help

External binaries required at runtime:

ffmpeg with --enable-libx264 (and --enable-libsvtav1 if you pass --encoder libsvtav1) on PATH (or --ffmpeg-bin).
vmaf (this fork's CLI, built via meson) on PATH (or --vmaf-bin).

Predictor Training Corpora¶

vmaftune.predictor_train trains the per-codec ONNX predictors consumed by the fast, per-shot, ladder, and auto paths. --corpus accepts either a single Phase-A JSONL file or a directory of JSONL shards; directory inputs are scanned recursively in sorted order so the trainer can consume .workingdir2/corpus_run/ directly:

python -m vmaftune.predictor_train \
    --corpus .workingdir2/corpus_run \
    --codec libx264 \
    --output-dir .workingdir2/predictor-real

Rows are filtered per codec after schema aliases are normalised. The trainer accepts both current corpus.py rows (encoder, crf, vmaf_score, bitrate_kbps) and older hardware-sweep rows (codec, q/cq, vmaf, actual_kbps). If a codec has no usable rows, that codec still falls back to the documented synthetic-stub corpus and the model card records corpus.kind: synthetic-stub-*. Directory inputs are passed through the same loader as single files; a codec with rows in any shard records corpus.kind: real-N=<rows>.

When rows include richer predictor inputs, the trainer now preserves them instead of zero-filling: probe_i_frame_avg_bytes, probe_p_frame_avg_bytes, probe_b_frame_avg_bytes, saliency_mean, saliency_var, frame_diff_mean, y_avg, and y_var. Older rows remain valid; missing probe-byte columns use the deterministic bitrate stand-ins and missing saliency / signalstats columns stay 0.0.

vmaf-tune predict --use-saliency uses the same saliency feature slots at runtime. It decodes each sampled shot to temporary yuv420p, runs the configured saliency ONNX, and feeds only saliency_mean / saliency_var into the predictor:

vmaf-tune predict \
    --source source.mp4 \
    --codec libx264 \
    --target-vmaf 96 \
    --use-saliency \
    --saliency-model model/tiny/saliency_student_v1.onnx

This flag is separate from recommend-saliency --saliency-aware, which creates ROI/QP sidecars for the encoder.

Local Sidecar Bias Correction¶

vmaf-tune sidecar exposes the local sidecar model from docs/ai/local-sidecar-training.md as an operator CLI. It never uploads captures and never mutates the shipped predictor; it stores only the online-ridge correction under ${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/.

Inspect the current sidecar state:

vmaf-tune sidecar status --codec libx264 --json

Record one observed encode. features.json may be either a flat ShotFeatures object or { "features": { ... } }; the required fields are probe_bitrate_kbps, probe_i_frame_avg_bytes, probe_p_frame_avg_bytes, and probe_b_frame_avg_bytes.

vmaf-tune sidecar record \
    --codec libx264 \
    --features-json features.json \
    --crf 28 \
    --observed-vmaf 94.2

Batch training accepts JSONL with one row per observed encode:

{"features":{"probe_bitrate_kbps":3000,"probe_i_frame_avg_bytes":10000,"probe_p_frame_avg_bytes":2000,"probe_b_frame_avg_bytes":1000,"width":1920,"height":1080,"fps":24},"crf":28,"observed_vmaf":94.2}

vmaf-tune sidecar batch-record --codec libx264 --captures-jsonl captures.jsonl

Prediction reports the bare predictor score, sidecar correction, and clamped final score:

vmaf-tune sidecar predict --codec libx264 --features-json features.json --crf 28 --json

Quick start¶

Generate a 6-cell corpus row over (medium, slow) × (22, 28, 34) for one 1080p source clip:

vmaf-tune corpus \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --preset medium --preset slow \
    --crf 22 --crf 28 --crf 34 \
    --output corpus.jsonl

--source is repeatable — pass one flag per source clip. The grid is the Cartesian product of --preset × --crf.

SVT-AV1 example (ADR-0278)¶

The libsvtav1 adapter accepts the same x264-style preset names — they are translated to SVT-AV1's integer presets internally. AV1 CRF values live in 0..63; the Phase A informative window is (20, 50):

vmaf-tune corpus \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --encoder libsvtav1 \
    --preset medium --preset slow \
    --crf 28 --crf 35 --crf 42 \
    --output corpus_av1.jsonl

The corpus row records the human-readable preset name ("medium"), while the FFmpeg argv carries the integer SVT-AV1 expects (-preset 7).

Codec adapter parameter ranges¶

Each adapter declares its own quality knob, range, and preset vocabulary. The harness validates (preset, crf) against the adapter before invoking FFmpeg.

Encoder	CRF (absolute)	Phase A CRF window	Default CRF	Presets
`libx264`	`0..51`	`(15, 40)`	`23`	`ultrafast`, `superfast`, `veryfast`, `faster`, `fast`, `medium`, `slow`, `slower`, `veryslow`
`libsvtav1`	`0..63`	`(20, 50)`	`35`	`placebo`, `slowest`, `slower`, `slow`, `medium`, `fast`, `faster`, `veryfast`
`libvpx-vp9`	`0..63`	`(0, 63)`	`32`	`placebo`, `slowest`, `slower`, `slow`, `medium`, `fast`, `faster`, `veryfast`, `superfast`, `ultrafast`

SVT-AV1 preset name -> integer mapping¶

SVT-AV1 uses integer presets 0..13 (0 = slowest / best, 13 = fastest). The harness maps the x264-style names below to AV1 integers so the corpus row schema is codec-independent:

Name	SVT-AV1 integer	Notes
`placebo`	`0`	Slowest; research-grade only.
`slowest`	`1`
`slower`	`3`
`slow`	`5`
`medium`	`7`	SVT-AV1 default.
`fast`	`9`
`faster`	`11`
`veryfast`	`13`	Fastest.

The mapping is closed and order-stable; see ADR-0294.

CLI flags¶

Flag	Default	Notes
`--source PATH`	—	Required. Repeatable for multi-source sweeps.
`--width / --height`	—	Required. Source resolution.
`--pix-fmt PFMT`	`yuv420p`	Forwarded to ffmpeg `-pix_fmt`.
`--framerate F`	`24.0`	Source framerate.
`--duration S`	`0`	Source duration in seconds (used for bitrate calc).
`--encoder NAME`	`libx264`	One of `libx264`, `libx265`.
`--encoder NAME`	`libx264`	One of `libx264`, `h264_amf`, `hevc_amf`, `av1_amf`.
`--preset P`	—	Required. Repeatable. x264 preset name.
`--crf N`	—	Required. Repeatable. x264 CRF integer.
`--encoder NAME`	`libx264`	Currently wired: `libx264`, `libsvtav1` (ADR-0294).
`--preset P`	—	Required. Repeatable. Preset name (see codec table below).
`--crf N`	—	Required. Repeatable. CRF integer (range varies by codec).
`--output PATH`	`corpus.jsonl`	JSONL destination.
`--encode-dir PATH`	`.workingdir2/encodes`	Scratch dir; gitignored by convention.
`--keep-encodes`	off	Retain encoded files after scoring.
`--vmaf-model NAME`	`vmaf_v0.6.1`	Forwarded to `vmaf --model`. Only used when `--no-resolution-aware` is set; otherwise auto-picked per encode resolution (see "Resolution-aware mode" below).
`--resolution-aware` / `--no-resolution-aware`	on	Auto-pick the VMAF model per encode resolution. Default on.
`--ffmpeg-bin PATH`	`ffmpeg`	Override the ffmpeg binary.
`--ffprobe-bin PATH`	`ffprobe`	Override the ffprobe binary (used for HDR detection).
`--vmaf-bin PATH`	`vmaf`	Override the vmaf binary.
`--score-backend NAME`	`auto`	libvmaf scoring backend — `auto\\|cpu\\|cuda\\|sycl\\|hip`. See below. (`vulkan` was removed in ADR-0726.)
`--no-source-hash`	off	Skip `src_sha256` (faster on large YUVs; loses provenance).
`--auto-hdr`	(default)	Probe each source via ffprobe; inject HDR flags + HDR model when PQ / HLG signaling is detected.
`--force-sdr`	off	Treat all sources as SDR; skip HDR detection.
`--force-hdr-pq`	off	Treat all sources as HDR PQ (SMPTE-2084) without probing. Useful for raw YUV refs that ffprobe cannot read color metadata from.
`--force-hdr-hlg`	off	Treat all sources as HDR HLG (ARIB STD-B67) without probing.
`--two-pass`	off	Phase F (ADR-0333 + ADR-0546). Run a 2-pass encode for codecs whose adapter sets `supports_two_pass = True` (today: `libx264`, `libx265`, `libvpx-vp9`, `libaom-av1`, `libvvenc`). Codecs without 2-pass support fall back to single-pass with a stderr warning. Doubles encode wall time.
`--vaapi-device PATH`	`auto`	VA-API DRI render-node for Intel QSV hardware-device init. `auto` selects the first Intel render node from `/sys/class/drm` and falls back to `/dev/dri/renderD128` only when no Intel node is discoverable. Also overridable via `VMAFTUNE_VAAPI_DEVICE` env var; the flag takes precedence. Ignored for non-QSV encoders. (ADR-0641)

Resolution-aware mode¶

VMAF is a resolution-aware metric: the fork ships two production-grade pooled-mean models — vmaf_v0.6.1 (trained on a 1080p viewing setup) and vmaf_4k_v0.6.1 (re-fit for a 4K display). Scoring 4K content against the 1080p model under-counts spatial detail; scoring 1080p content against the 4K model over-counts coding artefacts. The bias is several VMAF points either way — large enough to poison a mixed-resolution ABR-ladder corpus.

When --resolution-aware is on (the default), vmaf-tune picks the model per encode according to a height-only rule that mirrors Netflix's published guidance:

Encode height	Selected model
`≥ 2160` (UHD-1 and up)	`vmaf_4k_v0.6.1`
`< 2160` (everything else, including 1440p / 720p / SD)	`vmaf_v0.6.1`

The fork has no 720p / 1440p / SD model — vmaf_v0.6.1 is the canonical fallback for all sub-2160p content (matches Netflix's recommendation).

The emitted JSONL row's vmaf_model field now records the effective model used for that row, not the global --vmaf-model opt. Mixed-ladder corpora legitimately contain multiple distinct vmaf_model values across rows; downstream consumers should group / filter by vmaf_model rather than assume a constant.

To force a single model regardless of resolution (e.g. to reproduce a legacy single-model corpus), pass --no-resolution-aware:

vmaf-tune corpus \
    --source ref_4k.yuv --width 3840 --height 2160 \
    --preset medium --crf 23 \
    --no-resolution-aware --vmaf-model vmaf_v0.6.1 \
    --output corpus.jsonl

The Python API exposes the decision rule directly for callers that need to consult it outside the corpus loop:

from vmaftune.resolution import (
    select_vmaf_model_version,    # str: "vmaf_v0.6.1" or "vmaf_4k_v0.6.1"
    select_vmaf_model,            # Path: in-tree model JSON file
    crf_offset_for_resolution,    # int: -2 / 0 / +2 / +4 by resolution band
)

assert select_vmaf_model_version(3840, 2160) == "vmaf_4k_v0.6.1"
assert select_vmaf_model_version(1920, 1080) == "vmaf_v0.6.1"

crf_offset_for_resolution returns a small integer offset that the future search layer (Phase B target-VMAF bisect) can apply when seeding bisect bounds across an ABR ladder. The shipped defaults are codec-agnostic and conservative; Phase B/C/D will learn per-codec offsets from real corpora and override them via the same function signature. See ADR-0289 and Research-0054 for the full rationale.

GPU scoring backend¶

Per ADR-0299, vmaf-tune corpus forwards a --backend NAME argument to the libvmaf CLI so scoring runs on a GPU when one is present. CPU VMAF runs at ~1–2 fps on 1080p; the CUDA / SYCL / HIP backends shipped with this fork (ADR-0667) deliver 10–30× speedup on the score axis. (The Vulkan backend was removed in ADR-0726; use HIP for vendor-neutral AMD / Intel Arc GPU scoring.)

Modes¶

Value	Behaviour
`auto` (default)	Detect what the local `vmaf` binary supports + what the host hardware advertises, then pick the fastest available backend in vmaf-tune probe order: `cuda → sycl → hip → cpu`. Falls back to CPU silently only because no GPU was found.
`cuda` / `sycl` / `hip`	Strict mode. Errors out with `BackendUnavailableError` if the local `vmaf` binary does not support the requested backend or the host hardware is missing. No silent downgrade to CPU — that would mask hardware/build mismatches and lie about wall-clock expectations.
`cpu`	Force the CPU path. Useful for reproducibility against the Netflix golden-data gate or to bypass a known-bad GPU driver day.

Detection heuristics¶

CUDA: nvidia-smi -L returns at least one GPU line.
SYCL: sycl-ls lists at least one :gpu device.
HIP/ROCm: rocminfo reports a gfx* agent, or rocm-smi reports a GPU.

Missing tools degrade to "backend not available" — they never raise hard errors. CPU is always considered available even if the help line is missing.

Probe order vs. libvmaf registry order: vmaf-tune's auto probe order (cuda → sycl → hip → cpu) differs from the libvmaf internal registry order (sycl → cuda → hip → cpu). vmaf-tune probes CUDA first because nvidia-smi provides the most reliable availability detection. The libvmaf registry prioritises SYCL first as the primary continuous-integration GPU target. When comparing timing results, be aware that auto may select different backends in vmaf-tune versus a direct vmaf --backend auto invocation.

Wall-clock expectation (60 s 1080p source, indicative)¶

Score backend	Hardware	Wall-clock	Throughput
`cpu`	AVX2 desktop CPU	~600–1200 s	~1.2–2.5 fps
`cuda`	RTX 30/40-class GPU	~50–120 s	~12–30 fps
`sycl`	Intel Arc / Iris Xe	~80–180 s	~8–18 fps
`hip`	RDNA2 / RDNA3 ROCm host	~80–180 s	~8–18 fps

Numbers are order-of-magnitude only; exact figures depend on the specific feature extractors enabled by the model (vmaf_v0.6.1 versus tiny-AI variants), whether --keep-encodes is on, and host I/O bandwidth. Cross-backend numerical parity is guaranteed to places=4 by the ADR-0214 CI gate.

Examples¶

# Default — auto-pick the fastest backend.
vmaf-tune corpus --source ref.yuv --width 1920 --height 1080 \
    --preset medium --crf 22 --crf 28

# Force CUDA. Errors out clearly if /opt/cuda is missing or the
# vmaf binary was built without CUDA support.
vmaf-tune corpus --source ref.yuv --width 1920 --height 1080 \
    --preset medium --crf 22 --score-backend cuda

# Pin CPU for reproducibility against the Netflix golden gate.
vmaf-tune corpus --source ref.yuv --width 1920 --height 1080 \
    --preset medium --crf 22 --score-backend cpu

Vulkan score backend (`--score-backend=vulkan`)¶

Status: REMOVED — ADR-0726 (2026-05-28). The Vulkan backend was removed from libvmaf. --score-backend=vulkan is no longer a valid value — passing it returns a non-zero exit code with an "unsupported backend" error. Use --score-backend=hip for vendor-neutral GPU scoring on AMD or Intel Arc hosts. The sections below are preserved for historical reference only; none of the workflows described are functional in current builds.

Per ADR-0314 (now superseded by ADR-0726), the vulkan value of --score-backend was the vendor-neutral GPU score path. Use --score-backend=hip for AMD/Intel Arc hosts going forward.

Supported platforms¶

Host	Driver	Status
Linux + AMD RDNA2/RDNA3	Mesa RADV	Production.
Linux + Intel Arc / Iris Xe	Mesa anv	Production.
Linux + NVIDIA	Mesa NVK or proprietary `nvidia` driver	Production; coexists with `--score-backend=cuda`.
Linux CI / no GPU	Mesa lavapipe (software rasteriser)	Slow but functional — the cross-backend parity gate (ADR-0214) runs on lavapipe.
macOS (Apple Silicon + Intel)	MoltenVK 1.2 layered over Metal	Functional.
Windows	Vendor-supplied Vulkan ICD	Functional; not gated in CI yet.

The libvmaf binary needs to be built with -Denable_vulkan=true (default in fork release artefacts). The vulkan value will fail strict-mode validation otherwise — vmaf --help will not advertise vulkan in its --backend alternation.

Verifying Vulkan availability¶

vmaf-tune runs the same probe libvmaf does — vulkaninfo --summary must succeed and report at least one deviceName:

$ vulkaninfo --summary | grep deviceName
        deviceName        = AMD Radeon RX 7900 XTX (RADV NAVI31)

If that command is missing, install the Vulkan SDK loader (Linux: vulkan-tools package; macOS: brew install vulkan-tools).

Example¶

# Vendor-neutral GPU scoring on AMD / Intel Arc / MoltenVK hosts.
vmaf-tune corpus --source ref.yuv --width 1920 --height 1080 \
    --preset medium --crf 22 --crf 28 --score-backend vulkan

Failure mode (no Vulkan loader installed):

vmaf-tune: backend 'vulkan' requested but not available on this host
(available: cpu). Check that the local vmaf binary was built with the
matching backend support and the corresponding runtime/driver is
installed.

The exit code is 2 and no encodes are dispatched — the strict-mode guarantee from ADR-0299 (no silent CPU downgrade) is preserved across all four backend values.

Sample-clip mode¶

Set --sample-clip-seconds N to evaluate each grid cell on the centre N-second slice of the source instead of the full reference. This is a runtime/accuracy trade-off, formalised in ADR-0301.

Speedup. Encode wall-time scales roughly linearly with slice length, so e.g. a 10-second slice of a 60-second source is a ~6x speedup per grid point. The libvmaf scoring pass shrinks by the same factor (it reads only the matching reference window via --frame_skip_ref / --frame_cnt).
Accuracy delta. Expect ~1-2 VMAF points of drift versus full-clip on diverse content (mixed-shot trailers, sports, action), tighter (~0.3-0.5 VMAF points) on uniform content (single-shot interviews, animation, static stills). The delta is per-cell consistent — relative ordering between (preset, crf) cells survives, which is what Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) actually consume. Full-clip rescoring of the predictor's pick is the recommended Phase C epilogue.
Window placement. Naive centre-anchored: the slice is (duration_s − N) / 2 .. (duration_s + N) / 2. Smarter shot-aware placement (e.g. via transnet_v2) is on the follow-up backlog.
Fallback. If N >= duration_s the harness silently falls back to full-clip mode and tags the row clip_mode="full" — the request is treated as "use the whole source", not as an error.
Bitrate semantics. bitrate_kbps is computed against the encoded duration, so sample-clip rows aren't biased low by dividing slice-bytes by full-source seconds. duration_s keeps the original source provenance.

# 6x faster grid sweep — 10s of a 60s source per cell.
vmaf-tune corpus \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 60 \
    --preset medium --preset slow \
    --crf 22 --crf 28 --crf 34 \
    --sample-clip-seconds 10 \
    --output corpus.jsonl

Each emitted row carries clip_mode="sample_10s" (or "full"), letting Phase B/C either filter sample rows out, weight them differently, or rescore the chosen cell on the full source.

`benchmark` subcommand — corpus-level codec ranking¶

vmaf-tune benchmark is Phase G of the tune workflow. It does not run FFmpeg or libvmaf; it reads an existing Phase-A JSONL corpus and ranks encoders by their best matched-quality point.

For each encoder in the corpus, the command filters to successful rows with finite vmaf_score and bitrate_kbps, then chooses the lowest bitrate row whose VMAF clears --target-vmaf. Encoders that never clear the target stay in the report as status=unmet using their closest VMAF miss, so a too-narrow CRF sweep is visible instead of silently dropped.

vmaf-tune benchmark \
    --from-corpus corpus.jsonl \
    --target-vmaf 92 \
    --baseline-encoder libx264 \
    --format markdown

Output formats:

Format	Use
`markdown`	PR comments and human review.
`json`	Notebooks, dashboards, and follow-up automation.
`csv`	Spreadsheets and quick plots.

--baseline-encoder controls the bitrate-delta column. When omitted, the baseline is the lowest-bitrate encoder that clears the target. The report inherits the corpus coverage: if libx264 was swept over 20 CRFs and libx265 over 3 CRFs, the ranking reflects those sampled points only.

Corpus JSONL schema¶

Each row is one JSON object on its own line. The full key list is exported as vmaftune.CORPUS_ROW_KEYS for programmatic consumers and versioned via vmaftune.SCHEMA_VERSION (currently 3 — v2 added clip_mode for sample-clip mode under ADR-0301; v3 added the HDR provenance triple hdr_transfer / hdr_primaries / hdr_forced when corpus.iter_rows was wired to hdr.detect_hdr + hdr.hdr_codec_args per the ADR-0300 status update of 2026-05-08). Bumping the schema is a coordinated change with Phase B/C; do not edit row shape without bumping the version.

Key	Type	Description
`schema_version`	int	Currently `3`. v3 adds the `enc_internal_*` aggregates (ADR-0332).
`run_id`	str	Per-row UUID4 hex.
`timestamp`	str	UTC ISO-8601 (seconds precision).
`src`	str	Path to the reference YUV.
`src_sha256`	str	SHA-256 of the reference (empty if `--no-source-hash`).
`width` / `height`	int	Source dimensions.
`pix_fmt`	str	Source pixel format.
`framerate`	float	Source framerate.
`duration_s`	float	Source duration in seconds.
`encoder`	str	Codec adapter name (e.g. `libx264`).
`encoder_version`	str	Detected encoder version (e.g. `libx264-164`).
`preset`	str	Encoder preset.
`crf`	int	Quality knob value.
`extra_params`	list[str]	Additional encoder argv (Phase A: `[]`).
`encode_path`	str	Path to encoded file (empty if not retained).
`encode_size_bytes`	int	Encoded file size.
`bitrate_kbps`	float	`(encode_size_bytes × 8 / 1000) / duration_s`.
`encode_time_ms`	float	Wall-clock encode time.
`vmaf_score`	float	Pooled-mean VMAF (`NaN` if scoring skipped/failed).
`vmaf_model`	str	Model version string (e.g. `vmaf_v0.6.1`).
`score_time_ms`	float	Wall-clock scoring time.
`ffmpeg_version`	str	Detected ffmpeg version.
`vmaf_binary_version`	str	Detected vmaf binary version.
`exit_status`	int	First non-zero of (encode, score) exit codes.
`hdr_transfer`	str	`""` (SDR), `"pq"` (SMPTE-2084) or `"hlg"` (ARIB STD-B67). Schema v3+.
`hdr_primaries`	str	Raw ffprobe `color_primaries` (e.g. `bt2020`); empty for SDR. Schema v3+.
`hdr_forced`	bool	`true` iff the user overrode detection via `--force-hdr-*` / `--force-sdr`. Schema v3+.
`clip_mode`	str	`"full"` (default) or `"sample_<N>s"` per `--sample-clip-seconds`. Schema v2+.
`shot_count`	int	Number of TransNet-V2 shots in the source (`0` when shot detection unavailable). v3+.
`shot_avg_duration_sec`	float	Mean shot length in seconds (`0.0` when unavailable). v3+.
`shot_duration_std_sec`	float	Population std of shot lengths in seconds — content-class proxy (animation: low; live action: high). v3+.
`adm2_mean`	float	Per-frame ADM2 mean (canonical-6). `NaN` when scoring skipped. v3+ (ADR-0366).
`vif_scale0_mean` … `motion2_std`	float	Remaining canonical-6 mean/std aggregates (12 columns total). v3+ (ADR-0366).
`enc_internal_qp_mean`	float	Per-frame QP mean from x264 pass-1 stats. `0.0` for opt-out adapters. v3+ (ADR-0332).
`enc_internal_qp_std`	float	Per-frame QP standard deviation. v3+ (ADR-0332).
`enc_internal_bits_mean`	float	Per-frame bit-cost mean (`tex+mv+misc`). v3+ (ADR-0332).
`enc_internal_bits_std`	float	Per-frame bit-cost standard deviation. v3+ (ADR-0332).
`enc_internal_mv_mean`	float	Per-frame motion-vector bit-cost mean. v3+ (ADR-0332).
`enc_internal_mv_std`	float	Per-frame motion-vector bit-cost standard deviation. v3+ (ADR-0332).
`enc_internal_itex_mean`	float	Mean intra-texture cost across I/i frames. v3+ (ADR-0332).
`enc_internal_ptex_mean`	float	Mean predicted-texture cost across P/B/b frames. v3+ (ADR-0332).
`enc_internal_intra_ratio`	float	Fraction of macroblocks coded as intra. v3+ (ADR-0332).
`enc_internal_skip_ratio`	float	Fraction of macroblocks coded as skip. v3+ (ADR-0332).

The ten enc_internal_* columns are populated for adapters that declare supports_encoder_stats = True (currently libx264 and libx265). The parser normalizes x264 macroblock counters and x265 icu / pcu / scu CTU counters into the same intra / predicted / skip ratio columns. Hardware encoders (NVENC / AMF / QSV / VideoToolbox) and AV1 software encoders (libaom-av1 / libsvtav1 / libvvenc) opt out and emit 0.0 for every column so the schema is uniform across the corpus. The trade-off: per-encode wall-clock cost roughly doubles for opt-in adapters because the harness runs a stats-only -pass 1 invocation before the production CRF encode.

Example row¶

{
  "schema_version": 3,
  "run_id": "0a3b1c8b...",
  "timestamp": "2026-05-03T16:00:00+00:00",
  "src": "ref.yuv",
  "src_sha256": "",
  "width": 1920, "height": 1080, "pix_fmt": "yuv420p",
  "framerate": 24.0, "duration_s": 10.0,
  "encoder": "libx264", "encoder_version": "libx264-164",
  "preset": "medium", "crf": 28,
  "extra_params": [],
  "encode_path": "",
  "encode_size_bytes": 845210,
  "bitrate_kbps": 676.168,
  "encode_time_ms": 4321.0,
  "vmaf_score": 92.41,
  "vmaf_model": "vmaf_v0.6.1",
  "score_time_ms": 1820.5,
  "ffmpeg_version": "6.1.1",
  "vmaf_binary_version": "3.0.0-lusoris.0",
  "exit_status": 0,
  "clip_mode": "full",
  "shot_count": 12,
  "shot_avg_duration_sec": 0.83,
  "shot_duration_std_sec": 0.41,
  "adm2_mean": 9.73, "adm2_std": 0.12,
  "enc_internal_qp_mean": 25.23,
  "enc_internal_qp_std": 0.12,
  "enc_internal_bits_mean": 4975.0,
  "enc_internal_bits_std": 1820.5,
  "enc_internal_mv_mean": 60.3,
  "enc_internal_mv_std": 32.1,
  "enc_internal_itex_mean": 8000.0,
  "enc_internal_ptex_mean": 1500.0,
  "enc_internal_intra_ratio": 0.07,
  "enc_internal_skip_ratio": 0.16
}

`recommend` subcommand — target-VMAF / target-bitrate¶

vmaf-tune recommend consumes the corpus (either pre-built via --from-corpus PATH.jsonl or generated on the fly from --source + grid flags) and applies one of two predicates:

--target-vmaf T — return the row with the smallest CRF whose vmaf_score >= T. If no row clears the bar, the row with the highest VMAF is returned and the predicate is annotated (UNMET) in the output. Exit code is still 0 for an honest closest-miss.
--target-bitrate KBPS — return the row whose bitrate_kbps is closest (absolute distance) to KBPS. Ties on distance go to the smaller CRF (higher quality).

The two flags are mutually exclusive — argparse rejects passing both with exit code 2.

When reading a pre-built corpus, recommend ignores rows whose exit_status is non-zero and rows with missing or non-finite vmaf_score. The --encoder and --preset flags act as filters in that mode, so a mixed-codec corpus can be reused without first splitting it into per-codec files.

Use a pre-built corpus¶

# Phase A — build once.
vmaf-tune corpus --source ref.yuv --width 1920 --height 1080 \
    --framerate 24 --duration 10 \
    --preset medium --crf 18 --crf 22 --crf 26 --crf 30 --crf 34 \
    --output corpus.jsonl

# Smallest CRF whose VMAF >= 92.
vmaf-tune recommend --from-corpus corpus.jsonl --target-vmaf 92.0

# CRF whose bitrate is closest to 5 Mbps.
vmaf-tune recommend --from-corpus corpus.jsonl --target-bitrate 5000

Build the corpus on the fly¶

vmaf-tune recommend \
    --source ref.yuv --width 1920 --height 1080 \
    --framerate 24 --duration 10 \
    --preset medium --crf 18 --crf 22 --crf 26 --crf 30 \
    --target-vmaf 92.0

If --preset and --crf are omitted, recommend sweeps medium × range(18, 36, 2) as a sensible default for ad-hoc runs.

Output¶

Default output is a single human-readable line on stdout, e.g.

encoder=libx264 preset=medium crf=22 vmaf=95.000 bitrate_kbps=5000.00 \
  predicate=target_vmaf>=92.0 margin=+3.000

Pass --json to get the full corpus row as a JSON object on stdout instead — convenient for piping into other tooling.

`recommend` flags¶

Flag	Default	Notes
`--target-vmaf T`	—	Smallest CRF whose `vmaf_score >= T`.
`--target-bitrate KBPS`	—	CRF whose `bitrate_kbps` is closest to `KBPS`.
`--from-corpus PATH`	—	Read rows from a pre-built JSONL. Skips encode + score.
`--source / --width / --height / --framerate / --duration`	—	Build a corpus on the fly. Required when `--from-corpus` is omitted.
`--encoder / --preset / --crf`	`libx264` / `medium` / `[18,20,...,34]`	Sweep grid (when building). Filter (when loading).
`--json`	off	Emit the winning row as JSON instead of the prose summary.

`fast` subcommand — proxy + Bayesian + GPU-verify (Phase A.5)¶

vmaf-tune fast is the seconds-to-minutes alternative to the Phase A grid for the recommendation use case. It runs an Optuna TPE search over the integer CRF axis, scores each trial with the fr_regressor_v2 proxy (ADR-0291) on the canonical-6 libvmaf features extracted from a short probe encode, then runs one real-encode + libvmaf verify pass at the recommended CRF before reporting. The slow grid stays canonical (ADR-0276) — fast is opt-in, and falls back to the grid when the proxy/verify gap exceeds the configured tolerance.

Install¶

The fast-path needs Optuna in addition to the core install:

pip install 'vmaf-tune[fast]'

The shipped [fast] extra is the only correct install path; the core package stays zero-extra-dep so corpus generation works on hosts that never run the fast path.

Smoke run (no ffmpeg / no ONNX / no GPU)¶

The --smoke flag swaps the proxy + verify pipeline for a deterministic synthetic CRF→VMAF curve so CI on bare hosts still exercises the search loop:

vmaf-tune fast --target-vmaf 92.0 --smoke --n-trials 12

{
  "encoder": "libx264",
  "n_trials": 12,
  "notes": "smoke mode — synthetic predictor; no ffmpeg / ONNX / GPU. See ADR-0276 + ADR-0304 + Research-0076 for the production path.",
  "predicted_kbps": 1954.27,
  "predicted_vmaf": 82.65,
  "proxy_verify_gap": null,
  "recommended_crf": 27,
  "smoke": true,
  "target_vmaf": 92.0,
  "verify_vmaf": null
}

Production run¶

vmaf-tune fast \
    --src ref.yuv --width 1920 --height 1080 \
    --framerate 24 --pix-fmt yuv420p \
    --encoder libx264 --preset medium \
    --target-vmaf 92.0 \
    --crf-min 18 --crf-max 40 \
    --n-trials 30 \
    --score-backend auto \
    --output recommendation.json

The recommendation lands as a single JSON object — same schema recommend and predict already emit, plus the fast-path-specific verify_vmaf and proxy_verify_gap diagnostics:

{
  "encoder": "libx264",
  "target_vmaf": 92.0,
  "recommended_crf": 22,
  "predicted_vmaf": 92.41,
  "predicted_kbps": 4820.0,
  "n_trials": 30,
  "smoke": false,
  "notes": "production: TPE over 30 trials with v2 proxy; GPU verify gap = 0.612 VMAF (tolerance 1.50).",
  "verify_vmaf": 91.80,
  "proxy_verify_gap": 0.612,
  "score_backend": "cuda"
}

Exit codes¶

Code	Meaning
`0`	Recommendation produced; proxy/verify gap within tolerance.
`2`	Argument validation error (missing `--src`, bad CRF range, ...).
`3`	Out-of-distribution: proxy/verify gap exceeded `--proxy-tolerance`. The recommendation is still emitted; callers should fall back to the slow Phase A grid (`vmaf-tune corpus` + `vmaf-tune recommend`).

Fall-back idiom¶

vmaf-tune fast --src ref.yuv --width 1920 --height 1080 \
    --target-vmaf 92.0 --output rec.json \
  || vmaf-tune recommend --source ref.yuv --width 1920 --height 1080 \
        --preset medium --target-vmaf 92.0 --output rec.json

The || chain captures both the production-error case (rc=2) and the OOD case (rc=3), so the slow grid is the safety net whenever the fast-path is not confident.

`fast` flags¶

Flag	Default	Notes
`--src PATH`	—	Source video. Required outside `--smoke`.
`--width / --height`	`0`	Raw-YUV geometry. Required outside `--smoke`.
`--pix-fmt`	`yuv420p`	ffmpeg pix_fmt for the probe + verify encodes.
`--framerate`	`24.0`	Reference framerate.
`--target-vmaf T`	—	Quality target on the standard `[0, 100]` scale. Required.
`--encoder`	`libx264`	Codec adapter; must be in `ENCODER_VOCAB_V2` for production mode.
`--preset`	`medium`	Encoder preset for the probe + verify encodes.
`--crf-min / --crf-max`	`10` / `51`	TPE search range over the integer CRF axis.
`--n-trials`	`30` (prod), `50` (smoke)	TPE trial budget.
`--time-budget-s`	`300`	Soft wall-clock cap for Optuna. TPE stops scheduling new trials after the timeout; an in-flight probe finishes.
`--proxy-tolerance`	`1.5`	Max abs proxy/verify gap before exit code `3`.
`--sample-chunk-seconds`	`5.0`	Probe-slice duration per TPE trial.
`--smoke`	off	Synthetic curve; no ffmpeg / ONNX / GPU.
`--score-backend`	`auto`	Verify-pass backend (`auto`/`cpu`/`cuda`/`sycl`/`hip`). (`vulkan` removed in ADR-0726.)
`--ffmpeg-bin / --vmaf-bin`	`ffmpeg` / `vmaf`	Tool paths.
`--vmaf-model`	`vmaf_v0.6.1`	libvmaf model for the verify pass.
`--encode-dir`	`.workingdir2/fast`	Scratch dir for probe + verify encodes.
`--output`	stdout	JSON destination for the recommendation payload.

`prefilter` subcommand — Pelorus deband + CRF joint autotune (ADR-1116)¶

vmaf-tune prefilter runs a VMAF-in-the-loop search that jointly tunes the Pelorus deband pre-filter's strengths and the encoder CRF, returning the lowest-bitrate combination that hits a target VMAF. This is the control-plane ("mode 2") seam between vmafx and Pelorus (Pelorus ADR-0106; contract in Pelorus ADR-0110).

vmafx stays Vulkan-free: it only emits the ffmpeg -vf "pelorus_deband_vulkan=range=..:thry=..:.." string and scores the encoded output. The deband filter runs inside ffmpeg — so the live loop requires an ffmpeg build with the Pelorus Vulkan deband filter. When that filter is absent the subcommand refuses the live run with a clear message and points you at --smoke.

The frozen knob contract¶

The search space is exactly the 10 tunable knobs Pelorus ADR-0110 freezes (name / type / range / default). vmafx hard-codes this table — renaming, narrowing, or retyping a knob is a coordinated two-repo break:

Knob	Type	Range	Default	Meaning
`range`	int	1–31	15	reference-sampling radius (px)
`thry`	float	0.0–0.25	0.012	luma flat-test threshold
`thrc`	float	0.0–0.25	0.012	chroma flat-test threshold
`grainy`	float	0.0–0.4	0.006	luma grain amplitude
`grainc`	float	0.0–0.4	0.0	chroma grain amplitude
`softness`	float	0.0–1.0	0.5	soft-blend transition width
`detail`	float	0.0–0.25	0.06	detail-mask activity threshold
`dither`	enum	0–2	2	0=none, 1=bayer8, 2=bluenoise
`dynamic`	bool	0–1	1	re-seed grain each frame
`protect`	bool	0–1	1	gate debanding off textured regions

The out-of-contract options (sample, blur, planes, meta) are never swept — they are pipeline-topology / reporting switches set once per run outside the optimizer.

How the joint search works¶

Each Optuna TPE trial proposes a full (deband-dict, crf) and the loop runs [pelorus_deband_vulkan=...] → HW encode → VMAF score for it. The objective is |achieved_vmaf − target| + λ·kbps, so the search converges on the lowest-bitrate deband+CRF that hits the target. CRF is an ordinal integer dimension joined to the 10 deband dimensions in one study — the two axes are co-optimised, not searched in nested loops (debanding shifts the rate-quality curve, so they are not separable). The result reports the recommended strengths, CRF, achieved VMAF, and the per-probe VMAF log.

The search engine is the same TPESampler study the fast subcommand uses (ADR-0276 / ADR-0304) — prefilter only constructs the joint search space and the objective.

Quick start (smoke)¶

Exercise the joint search end-to-end with a synthetic surface — no ffmpeg, no Vulkan, no GPU:

vmaf-tune prefilter --target-vmaf 92 --smoke --n-trials 40

Live loop (requires a Pelorus-enabled ffmpeg)¶

vmaf-tune prefilter \
  --src ref.yuv --width 1920 --height 1080 \
  --target-vmaf 93 --encoder libx264 \
  --crf-min 18 --crf-max 40 \
  --ffmpeg-bin /opt/ffmpeg-pelorus/bin/ffmpeg

The emitted recommendation includes a ready-to-use recommended_vf fragment, e.g.:

pelorus_deband_vulkan=range=12:thry=0.018:grainy=0.008

Restrict the swept knobs with one or more --sweep-knob flags (the rest stay at the filter default):

vmaf-tune prefilter --target-vmaf 93 --smoke \
  --sweep-knob range --sweep-knob grainy

`prefilter` flags¶

Flag	Default	Notes
`--src PATH`	—	Source video. Required for the live loop.
`--width / --height`	`0`	Raw-YUV geometry. Required for the live loop.
`--pix-fmt`	`yuv420p`	ffmpeg pix_fmt for the probe encodes.
`--framerate`	`24.0`	Reference framerate.
`--target-vmaf T`	—	Quality target on the `[0, 100]` scale. Required.
`--encoder`	`libx264`	Codec adapter that performs the post-deband encode.
`--preset`	`medium`	Encoder preset for the probe encodes.
`--filter`	`pelorus_deband`	Pre-encode filter adapter to autotune.
`--sweep-knob KNOB`	all 10	Repeatable; restricts the swept deband knobs.
`--crf-min / --crf-max`	`18` / `40`	Joint TPE search range over CRF.
`--n-trials`	`60` (live), `40` (smoke)	TPE trial budget.
`--time-budget-s`	`600`	Soft wall-clock cap for Optuna.
`--seed`	`0`	TPE sampler seed (reproducible search).
`--smoke`	off	Synthetic deband+CRF surface; no ffmpeg / Vulkan / GPU.
`--score-backend`	`auto`	Probe-score backend (`auto`/`cpu`/`cuda`/`sycl`/`hip`).
`--ffmpeg-bin / --vmaf-bin`	`ffmpeg` / `vmaf`	Tool paths.
`--vmaf-model`	`vmaf_v0.6.1`	libvmaf model for the probe scores.
`--neg`	off	Use the VMAF NEG model variant.
`--encode-dir`	`.workingdir2/prefilter`	Scratch dir for probe encodes.
`--output`	stdout	JSON destination for the recommendation payload.

Note: the live encode path is unit-tested with a mocked encode/score loop but has not been run against a real Pelorus-enabled ffmpeg in this environment (the filter is not installed here). See docs/state.md → T-PREFILTER-LIVE-ENCODE-UNTESTED-2026-06-14.

Codec adapters¶

Phase A wires libx264 end-to-end through the search loop. Additional codec adapters land as one-file additions under tools/vmaf-tune/src/vmaftune/codec_adapters/ and join the registry without touching the search loop. The currently-registered adapters are discoverable via vmaftune.codec_adapters.known_codecs().

`libaom-av1`¶

Google's reference AV1 encoder. Adapter shipped via ADR-0279.

Encoder name (FFmpeg): libaom-av1.
Quality knob: -crf integer in [0, 63] (default 35); higher CRF is lower quality.
Speed knob: -cpu-used integer in [0, 9] (0 = slowest/best, 9 = fastest). The adapter exposes a human-readable preset vocabulary that matches x264/x265 so a single sweep axis covers all four encoders.

`--preset` name	libaom `-cpu-used`
`placebo`	0 (slowest, highest quality)
`slowest`	1
`slower`	2
`slow`	3
`medium`	4 (default)
`fast`	5
`faster`	6
`veryfast`	7
`superfast`	8
`ultrafast`	9 (fastest)

Sample FFmpeg invocation produced by the adapter (Phase B+ wires the codec args into vmaftune.encode):

ffmpeg -i ref.y4m -c:v libaom-av1 -crf 35 -cpu-used 4 -an -y out.mkv

libaom vs SVT-AV1 trade-offs¶

libaom-av1 and libsvtav1 both target the AV1 bitstream but sit at different points on the speed/quality curve. Use the table below as a rough decision aid; the per-corpus numbers belong in your local sweep output, not here.

Concern	libaom-av1	libsvtav1
Encode wall time at matched preset	meaningfully slower	meaningfully faster
Quality at slow presets (matched bitrate)	slightly higher per AOM benchmarks	slightly lower
Quality at fast presets	comparable	comparable, sometimes ahead
CRF range	`0..63`	`0..63`
Best fit	offline / high-quality archive encodes	live, batch, large catalog

The fork's vmaf-tune corpus rows record the exact (encoder, preset, crf, vmaf_score, encode_time_ms, bitrate_kbps) tuple, so Phase C/D predictors can pick whichever encoder dominates the relevant region of the rate-distortion plane on a given source.

SVT-AV1-HDR (juliobbv-p/svt-av1-hdr) is a runtime variant of the libsvtav1 adapter, not a separate adapter. Its FFmpeg examples still use -c:v libsvtav1; the difference is which SVT library the FFmpeg binary is linked against. Use ADAPTER@VARIANT tokens plus --encoder-ffmpeg-bin to compare mainline SVT-AV1 and SVT-AV1-HDR in one report:

vmaf-tune compare \
    --src clip.mkv --width 3840 --height 2160 --pix-fmt yuv420p10le \
    --target-vmafs 94,96,98 \
    --encoders libsvtav1,libsvtav1@svt-av1-hdr \
    --ffmpeg-bin /opt/ffmpeg-8.1.1-main/bin/ffmpeg \
    --encoder-ffmpeg-bin libsvtav1@svt-av1-hdr=/opt/ffmpeg-8.1.1-svtav1-hdr/bin/ffmpeg \
    --format json --output svtav1-vs-hdr.json

The row label stays human-readable (codec = libsvtav1@svt-av1-hdr), while machine-readable provenance records adapter = libsvtav1, runtime_variant = svt-av1-hdr, and the exact ffmpeg_bin path. Tokens without a per-token binding use the global --ffmpeg-bin.

Hardware encoders (NVENC)¶

Phase A also wires the NVIDIA NVENC family for hardware-accelerated sweeps:

Adapter `--encoder`	FFmpeg encoder	Hardware required
`h264_nvenc`	`h264_nvenc`	NVIDIA Kepler+ (most modern GPUs)
`hevc_nvenc`	`hevc_nvenc`	NVIDIA Maxwell 2nd-gen+ (GTX 960+)
`av1_nvenc`	`av1_nvenc`	NVIDIA Ada Lovelace+ (RTX 40-series, L40, L4)

NVENC's quality knob is -cq (constant quantizer), the closest analogue to libx264 CRF. The fork's CQ window is the same [15, 40] perceptually informative range used for libx264; the hardware accepts [0, 51].

NVENC has seven preset levels (p1 fastest → p7 slowest). The CLI takes the same mnemonic preset names as libx264 and maps them:

Mnemonic	NVENC preset
`ultrafast`, `superfast`, `veryfast`	`p1`
`faster`	`p2`
`fast`	`p3`
`medium` (default)	`p4`
`slow`	`p5`
`slower`	`p6`
`slowest`, `placebo`	`p7`

Hardware encoders (AMF)¶

vmaf-tune ships software (libx264) and hardware adapters under one codec-adapter contract — the harness search loop is identical for all of them. AMD AMF (Advanced Media Framework) covers H.264, HEVC, and AV1 on AMD GPUs (see ADR-0282):

Encoder	Codec	Hardware	FFmpeg flag	Quality knob
`h264_amf`	H.264 / AVC	Any AMD GPU with AMF	`-c:v h264_amf`	`-qp_i / -qp_p` (cqp)
`hevc_amf`	HEVC / H.265	Any AMD GPU with AMF	`-c:v hevc_amf`	`-qp_i / -qp_p` (cqp)
`av1_amf`	AV1	RDNA3+ only (RX 7000 series and newer)	`-c:v av1_amf`	`-qp_i / -qp_p` (cqp)

Requirements:

AMD GPU with AMF support (AV1 needs RDNA3 silicon or newer).
FFmpeg built with --enable-amf.
The AMF runtime / driver installed (Adrenalin on Windows; the open-source Mesa AMF stack or AMD Pro driver on Linux).

Behavioural notes:

Preset compression — 7 levels collapse to 3. AMF exposes only three -quality rungs (quality, balanced, speed) where libx264 / NVENC / QSV expose seven preset names. The adapter maps preset names onto AMF rungs as follows:

Preset names	AMF `-quality`
`placebo`, `slowest`, `slower`, `slow`	`quality`
`medium` (default)	`balanced`
`fast`, `faster`, `veryfast`, `superfast`, `ultrafast`	`speed`

This is opinionated: AMF's hardware pipeline does not expose finer steps. Callers that need finer granularity should pin -qp_i / -qp_p (the qp quality knob) instead of the preset.

Rate control is constant-QP. AMF -rc cqp plus matched -qp_i / -qp_p is the closest analogue to x264 CRF. Range is 0..51; the harness exposes the (15, 40) Phase A informative window for cross-codec comparability.

Example:

Coarse-to-fine CRF search (ADR-0306)¶

Sweeping every CRF from 0..51 (52 encodes per source × preset) is wasteful when the only question is "what's the smallest CRF whose VMAF still meets my target?". The fork ships a 2-pass coarse-to-fine search that visits ~15 points instead of 52, a ~3.5× wall-time speedup with no measurable quality regression.

How it works:

Coarse pass at --coarse-step over [10, 50] — by default that's [10, 20, 30, 40, 50] (5 encodes).
Fine pass at --fine-step within ±--fine-radius of the best-coarse CRF. With defaults that's the 10 unique CRFs around the best (e.g. [25..29, 31..35] if best-coarse is 30).
1-pass shortcut: when the highest-CRF coarse point already meets the VMAF target, no refinement is needed (lower bitrate would have to come from CRFs above the coarse grid, which the fine pass wouldn't probe anyway). The search stops after the coarse pass.

`corpus --coarse-to-fine`¶

vmaf-tune corpus \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --encoder h264_nvenc \
    --preset medium --preset slow \
    --crf 23 --crf 28 --crf 34 \
    --output corpus_nvenc.jsonl

(The --crf flag carries the quality value regardless of whether the encoder names it CRF or CQ; the adapter forwards it as -cq for NVENC.)

Hardware vs software trade-off¶

NVENC is 10–100× faster than the software encoders at the cost of quality. Empirically, h264_nvenc at medium typically loses 3–5 VMAF points versus libx264 medium at the same bitrate, depending on content complexity. The Pareto frontier is genuinely different — that is precisely why the harness treats NVENC as separate codec entries rather than a flag on libx264. Use NVENC when you need a large corpus quickly or when the production pipeline is GPU-encoded; use software when you need the perceptually best encode at a given bitrate.

If FFmpeg reports Encoder h264_nvenc not found (or one of the sibling encoders), the FFmpeg build wasn't compiled with --enable-nvenc or the GPU lacks the relevant generation. The harness records the failure as exit_status != 0 and skips scoring, so a partial corpus over a heterogeneous fleet is still well-formed.

vmaf-tune corpus \
    --encoder h264_amf \
    --preset slow --preset medium --preset fast \
    --crf 23 --crf 28 --crf 34 \
    --output corpus_amf.jsonl

The raw FFmpeg invocation the adapter emits looks like:

ffmpeg -i ref.yuv -c:v h264_amf \
       -quality balanced -rc cqp -qp_i 23 -qp_p 23 \
       -an out.mkv

Hardware encoders (QSV)¶

Beyond libx264, the registry exposes the three Intel QSV (Quick Sync Video) hardware adapters added in ADR-0281. They share the QSV preset vocabulary (veryslow, slower, slow, medium, fast, faster, veryfast — same names as x264's medium-and-down subset) and use -global_quality N (ICQ rate control, range 1..51, semantically similar to CRF).

Adapter name	FFmpeg encoder	Quality knob	Hardware required
`h264_qsv`	`h264_qsv`	`global_quality` (1–51)	Intel iGPU 7th-gen+ (Kaby Lake or newer) or Arc / Battlemage
`hevc_qsv`	`hevc_qsv`	`global_quality` (1–51)	Intel iGPU 7th-gen+ (10-bit needs 11th-gen+) or Arc / Battlemage
`av1_qsv`	`av1_qsv`	`global_quality` (1–51)	Intel iGPU 12th-gen+ only or Arc / Battlemage

Hardware-device initialisation (Linux). FFmpeg's QSV bridge on Linux requires an explicit VA-API device chain before the input and a pixel-format conversion filter after it. vmaf-tune injects these automatically for all QSV encoders:

# Before -i:
-init_hw_device vaapi=va:/dev/dri/renderD129
-init_hw_device qsv=qsv_dev@va
-filter_hw_device va
# After -i, before -c:v:
-vf format=nv12,hwupload=extra_hw_frames=64

Without this chain every QSV encode fails with -22 Invalid argument. The VA-API device defaults to auto: vmaf-tune walks /dev/dri/by-path and /sys/class/drm/renderD*/device/vendor, selects the first Intel vendor node (0x8086), and falls back to /dev/dri/renderD128 only when no Intel render node is discoverable. Override with --vaapi-device /dev/dri/renderD129 or VMAFTUNE_VAAPI_DEVICE=/dev/dri/renderD129 when you need to pin a specific node. (ADR-0641)

The underlying FFmpeg invocations look like:

ffmpeg \
    -init_hw_device vaapi=va:/dev/dri/renderD129 \
    -init_hw_device qsv=qsv_dev@va \
    -filter_hw_device va \
    -i src.mkv \
    -vf format=nv12,hwupload=extra_hw_frames=64 \
    -c:v h264_qsv -preset medium -global_quality 23 -an out.mkv

vmaf-tune validates the (preset, global_quality) pair before spawning ffmpeg and probes ffmpeg -encoders for the requested encoder; if libmfx / VPL is not compiled in, the harness raises RuntimeError with a build-time hint rather than letting ffmpeg emit an Encoder not found line buried in stderr.

Apple VideoToolbox adapters¶

The h264_videotoolbox, hevc_videotoolbox, and prores_videotoolbox registry entries cover Apple Silicon (M-series) and T2-equipped Intel Macs via the VideoToolbox.framework hardware encoder. The H.264 and HEVC adapters were added in ADR-0283; the ProRes adapter follows the same registry pattern (see ADR-0283 Status update 2026-05-09). All three share _videotoolbox_common.py for the preset → -realtime mapping; H.264 / HEVC also share the -q:v quality knob, while ProRes uses the integer profile tier instead.

Adapter name	FFmpeg encoder	Quality knob	Hardware required
`h264_videotoolbox`	`h264_videotoolbox`	`q:v` (0..100, higher = better)	Apple Silicon or Intel Mac with T2
`hevc_videotoolbox`	`hevc_videotoolbox`	`q:v` (0..100, higher = better)	Apple Silicon or Intel Mac with T2
`prores_videotoolbox`	`prores_videotoolbox`	`profile:v` (0..5, higher tier = better)	Apple Silicon M1 Pro / Max / Ultra or later

VideoToolbox exposes only a binary -realtime {0,1} flag instead of a multi-valued preset, so the harness's nine-name preset vocabulary collapses onto that boolean per the table in _videotoolbox_common.py: ultrafast/superfast/veryfast/faster/fast → realtime=1 (low-latency fast path); medium/slow/slower/veryslow → realtime=0 (offline / quality-priority). The mapping is intentionally lossy — VT cannot expose a finer dial.

The underlying FFmpeg invocations look like:

ffmpeg -i src.mkv -c:v h264_videotoolbox -realtime 0 -q:v 60 -an out.mkv
ffmpeg -i src.mkv -c:v hevc_videotoolbox -realtime 0 -q:v 60 -an out.mkv
ffmpeg -i src.mkv -c:v prores_videotoolbox -realtime 0 -profile:v hq -an out.mov

AV1 hardware encoding is intentionally not wired — Apple Silicon has no AV1 hardware encoder block as of 2026 and FFmpeg exposes no av1_videotoolbox. Use libaom-av1 or libsvtav1 for AV1 on macOS.

vmaf-tune validates the (preset, quality) pair via the adapter (-q:v for H.264 / HEVC; integer tier id for ProRes) and probes ffmpeg -encoders for the requested encoder; if VideoToolbox is unavailable (e.g. the host is Linux), the harness raises RuntimeError rather than letting ffmpeg emit Encoder not found.

ProRes tier reference¶

ProRes is a fixed-rate intermediate codec — there is no CRF / QP scalar. Quality is selected entirely by the tier, and bitrate is implicit in the tier × resolution × frame-rate combination. The harness's --crf flag carries the integer tier id (the FFmpeg profile:v AVOption value); the adapter emits the canonical FFmpeg alias on the argv for diagnosability.

`crf` value	FFmpeg alias	Marketing name	Typical use
0	`proxy`	ProRes 422 Proxy	Offline editing, dailies
1	`lt`	ProRes 422 LT	Broadcast acquisition
2	`standard`	ProRes 422	Mainline broadcast master
3	`hq`	ProRes 422 HQ	High-end broadcast / film master (default)
4	`4444`	ProRes 4444	Graphics, alpha, colour grading
5	`xq`	ProRes 4444 XQ	High-dynamic-range / wide-gamut master

Source: FFmpeg libavcodec/videotoolboxenc.c prores_options AVOption table (verified against an FFmpeg n8.1.1 checkout).

ProRes is intra-only — every frame is a keyframe — so --keyint / --force-keyframes flags are accepted but have no rate-distortion effect. The harness still emits them so the muxer's seek-table density is predictable across codecs.

Saliency-aware encoding (`recommend-saliency --saliency-aware`)¶

Bucket #2 of the PR #354 audit (see ADR-0293) wires the fork-trained saliency_student_v1 ONNX model (ADR-0286) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions (faces, focal subjects, action) and saves bits on background.

Synopsis¶

vmaf-tune recommend-saliency \
    --src ref.yuv --width 1920 --height 1080 --framerate 24 \
    --duration-frames 240 \
    --preset medium --crf 23 \
    --saliency-aware \
    [--saliency-offset -4] \
    [--saliency-model model/tiny/saliency_student_v1.onnx] \
    [--saliency-aggregator mean|ema|max|motion-weighted] \
    --output out.mp4

How it works¶

compute_saliency_map() samples the requested frame window from the source YUV, runs those frames through saliency_student_v1.onnx (ImageNet-normalised RGB derived from YUV, NCHW [1, 3, H, W]), and reduces the per-frame saliency outputs into one mask in [0, 1].
saliency_to_qp_map() linearly maps the mask to per-pixel QP deltas — --saliency-offset is the QP delta at peak saliency (negative means better quality on salient regions). Background gets the symmetric positive delta. The output is clamped to [-12, +12] (matching the vmaf-roi sidecar convention from ADR-0247).
The pixel-level QP-offset map is dispatched to the encoder-specific ROI channel (see the encoder-targets table below); each channel reduces the map to its native granularity before writing any sidecar file or composing the argv slice.
The augmented extra_params (qpfile path, zones string, or ROI-map path) are appended to the FFmpeg command and the normal encode path runs.

Trade-off¶

Axis	Direction
Bitrate (same VMAF)	−10 % to −20 % for content with strong attention focus (faces, action, sport). Background-uniform content sees little change.
Encode time	+5 % typical (saliency inference + per-MB reduce; per-frame model time is sub-millisecond on CPU at SD/HD).
Decode time	unchanged (the bitstream is standard-compliant for all five supported encoders).
Quality (VMAF)	unchanged at the clip-mean level; concentrated where the eye looks.

Numbers are indicative. Today's recommend-saliency subcommand is a one-shot encode at --crf (or the adapter default); target-VMAF selection remains the job of recommend, compare, or a caller-provided bisect loop.

Temporal aggregation¶

--saliency-aggregator controls how sampled frame masks become the single ROI pattern applied to the encode:

Aggregator	Behaviour	Use when
`mean`	Per-pixel arithmetic mean; preserves the original implementation.	Default, stable clips, and baseline comparisons.
`ema`	Exponential moving average with `--saliency-ema-alpha` as the current-frame weight.	Motion or cuts make the latest sampled frames more representative.
`max`	Per-pixel maximum over sampled masks.	Missing a brief salient object is worse than over-protecting background.
`motion-weighted`	Weighted mean where each sampled frame is weighted by luma delta from the previous sampled frame.	Motion-heavy clips where changing frames should dominate the aggregate.

All four reducers use the same saliency_student_v1 weights and the same downstream QP mapping, so changing the reducer does not change the model contract or encoder sidecar format.

Graceful fallback¶

If onnxruntime is not installed or model/tiny/saliency_student_v1.onnx cannot be loaded, recommend-saliency --saliency-aware logs a warning and falls back to a plain encode. This matches the vmaf-roi C sidecar's posture.

If the chosen --encoder has no ROI dispatch implementation (e.g. h264_nvenc, libvpx-vp9, hevc_nvenc) the command exits with code 2 and a structured error message listing the supported codecs. To accept a plain encode instead, pass --saliency-fallback-plain (or set the environment variable VMAFTUNE_SALIENCY_FALLBACK_OK=1); in that case an ERROR is logged rather than a WARNING. This hard-fail posture matches ADR-0498 / ADR-0546.

Encoder targets¶

The saliency pipeline supports five encoder ROI mechanisms:

Encoder	ROI channel	Granularity	argv slot
`libx264`	ASCII `--qpfile` (x264 r2390+)	16×16 luma MB	`-x264-params qpfile=…`
`libaom-av1`	patched FFmpeg `-qpfile` ROI bridge	16×16 luma MB mapped onto libaom MI cells	`-qpfile …`
`libx265`	`--zones` QP delta (per-clip mean)	full-clip spatial mean	`-x265-params zones=0,N,q=<delta>`
`libsvtav1`	`--qp-file` offset map (SVT-AV1 v1.7+)	64×64 super-block	`-svtav1-params qp-file=…`
`libvvenc`	`ROIFile` CSV (VVenC v1.14.0+)	64×64 CTU	`-vvenc-params ROIFile=…`

See ADR-0293 (x264 baseline) and ADR-0414 (x265 / SVT-AV1 / VVenC). The libaom-av1 row uses the shared -qpfile bridge documented in vmaf-tune-ffmpeg.md.

Per-adapter helpers in vmaftune.saliency:

write_x265_zones_arg(block_offsets, duration_frames) → zones string
write_x264_qpfile(block_offsets, out_path, duration_frames) → x264/libaom qpfile
write_svtav1_qpoffset_map(block_offsets, out_path, duration_frames) → Path
write_vvenc_roi_csv(block_offsets, out_path, duration_frames) → Path

Caveats¶

Aggregate mask, not per-frame ROI. The current implementation reduces saliency across the sampled frames and applies one delta pattern across the whole clip. Per-frame ROI is on the roadmap.
x265 zones: spatial mean only. x265's --zones has temporal granularity but not per-block spatial granularity. The zone carries the mean QP delta across all blocks. Per-block x265 ROI requires a future x265 qpfile port (different format from x264).
libaom segment quantisation. The FFmpeg bridge maps 16×16 qpfile deltas onto libaom's MI grid and at most eight segment QPs. Very fine-grained deltas are therefore quantised to the nearest segment.
SVT-AV1 / VVenC: 64×64 granularity. Both encoders document 64×64 as their ROI-map unit. The saliency mask is reduced to this grid via reduce_qp_map_to_blocks(qp_map, block=64) before writing.
RGB saliency input. The student model receives BT.709-limited yuv420p converted to ImageNet-normalised RGB. Chroma is nearest-neighbour upsampled to luma resolution before inference.
Don't use the placeholder. mobilesal_placeholder_v0 and the radial fallback inside vmaf-roi are smoke-test stubs. Pass an explicit --saliency-model pointing at the real fork-trained weights when you want a perceptual benefit.

Reproducer¶

The test suite mocks the ONNX session and the encode runner so it runs without ffmpeg or onnxruntime installed:

pytest tools/vmaf-tune/tests/test_saliency.py \
       tools/vmaf-tune/tests/test_saliency_roi_adapters.py \
       tools/vmaf-tune/tests/test_saliency_roi_codec.py -v

Codec adapter contract¶

The encode driver (tools/vmaf-tune/src/vmaftune/encode.py) is codec-agnostic as of ADR-0297. It looks up the codec adapter via vmaftune.codec_adapters.get_adapter(req.encoder) and asks the adapter for its FFmpeg argv slice — the harness itself never branches on codec identity. Adding a new codec is one file under tools/vmaf-tune/src/vmaftune/codec_adapters/ plus a registry entry; the search loop, corpus row schema, and FFmpeg invocation stay untouched.

A codec adapter is a frozen dataclass exposing:

Member	Type	Purpose
`name`	`str`	Human-readable codec id (`"libx264"`).
`encoder`	`str`	FFmpeg `-c:v` value (`"libx264"`, `"h264_nvenc"`, ...).
`quality_knob`	`str`	Name of the quality knob (`"crf"`, `"cq"`, `"qp"`, ...).
`quality_range`	`tuple[int, int]`	Inclusive `(min, max)` for the knob.
`quality_default`	`int`	Default quality value.
`invert_quality`	`bool`	True when a higher value means lower quality (CRF / QP).
`presets`	`tuple[str, ...]`	Allowed preset names.
`validate(preset, quality) -> None`	method	Raises `ValueError` on out-of-range input.
`ffmpeg_codec_args(preset, quality) -> list[str]`	method	Codec-specific argv slice (e.g. `["-c:v", "libx264", "-preset", "medium", "-crf", "23"]`).
`extra_params() -> tuple[str, ...]`	method (optional)	Additional non-codec argv (e.g. `("-svtav1-params", "tune=0")`).

The dispatcher composes the final ffmpeg command as:

[ffmpeg, -y, -hide_banner, -loglevel info,
 -f rawvideo -pix_fmt <pf> -s WxH -r FR -i <src>,
 *adapter.ffmpeg_codec_args(preset, quality),
 *adapter.extra_params(),
 *req.extra_params,
 <output>]

Adapters that do not yet implement ffmpeg_codec_args (or for which get_adapter raises KeyError) fall back to the legacy x264-CRF shape (-c:v <encoder> -preset <p> -crf <q>) so partial adapters stay drivable end-to-end while their per-codec PRs are in flight.

parse_versions(stderr, encoder=...) selects a per-codec version probe; missing matches degrade to "unknown" rather than raising.

Adapters in flight¶

Codec	PR	Status
`libx264`	shipped (Phase A)	green
`libx265`	#362	adapter ships; dispatcher unblocks end-to-end
`libsvtav1`	#370	adapter ships; dispatcher unblocks end-to-end
`libaom-av1`	#360	adapter ships; dispatcher unblocks end-to-end
`libvvenc`	#368	adapter ships; dispatcher unblocks end-to-end
`h264_nvenc` / `hevc_nvenc` / `av1_nvenc`	#364	adapter ships; dispatcher unblocks end-to-end
`h264_qsv` / `hevc_qsv`	#367	adapter ships; dispatcher unblocks end-to-end
`h264_amf` / `hevc_amf`	#366	adapter ships; dispatcher unblocks end-to-end
`h264_videotoolbox` / `hevc_videotoolbox`	#373	adapter ships; dispatcher unblocks end-to-end

Codec comparison¶

vmaf-tune compare answers the perennial "should I migrate from x264 to SVT-AV1 yet?" question per-source: given one reference and a target VMAF, run each codec's recommend predicate in parallel and rank the results by smallest file. This is Bucket #7 of the vmaf-tune capability audit; the default CLI backend is Phase B target-VMAF bisect per ADR-0326.

vmaf-tune compare \
    --src ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --sample-clip-seconds 4 \
    --target-vmaf 92 \
    --encoders libx264,libx265,libsvtav1,libaom-av1,libvvenc \
    --crf-min 15 --crf-max 40 \
    --format markdown

The real bisect backend needs source geometry because raw YUV does not self-describe. Pass --width and --height explicitly; --pix-fmt, --framerate, and --duration default to the common SDR 24 fps shape but should be set for accurate scoring and bitrate math. Pass --sample-clip-seconds N to evaluate the centre N seconds of the source per bisect iteration; compare forwards matching --frame_skip_ref / --frame_cnt scorer bounds and normalises bitrate against the sample duration. For custom rankers or tests, --predicate-module MODULE:CALLABLE still accepts any importable (codec, src, target_vmaf) -> RecommendResult callable and bypasses the bisect backend.

Container sources auto-probe their framerate / duration (ADR-0509). When --src is a container (mp4 / mkv / mov / Y4M / ...) and you omit --framerate / --duration, the CLI calls ffprobe on the source and substitutes the probed values before invoking the bisect predicate. This avoids a silent failure mode where the argparse default --framerate=24 against a 60 fps source mis-aligns reference vs. distorted decodes and collapses VMAF to the 4-90 band regardless of CRF. Explicit --framerate / --duration flags still win; a stderr warning fires when an explicit value disagrees with the probed source rate (subsampling is legitimate but should be deliberate). For raw .yuv sources nothing changes — the probe is skipped because raw YUV does not self-describe its framerate.

Sample output (--format markdown, abridged):

# Codec comparison — target VMAF 92

- Source: `ref.yuv`
- Tool: `vmaf-tune 0.0.1`
- Wall time: 6421.3 ms

| Rank | Codec    | Encoder         | Best CRF | Bitrate (kbps) | Encode time (ms) | VMAF  | Status |
|---:|---|---|---:|---:|---:|---:|---|
| 1  | libaom-av1 | libaom-3.8.0   | 30       |         1500.0 |          18000.0 | 92.40 | ok     |
| 2  | libx265   | libx265-3.5     | 26       |         1700.0 |           4200.0 | 92.00 | ok     |
| 3  | libsvtav1 | libsvtav1-1.7.0 | 32       |         1900.0 |           2800.0 | 92.30 | ok     |
| 4  | libx264   | libx264-164     | 23       |         2400.0 |           1500.0 | 92.10 | ok     |

**Smallest file**: `libaom-av1` at CRF 30 → 1500.0 kbps (VMAF 92.40).

Multi-target rate-quality sweep (schema v2, ADR-0516 + ADR-0534 + ADR-0538)¶

vmaf-tune compare defaults to a 4-point rate-quality sweep covering premium-archival operating points: --target-vmafs 94,96,97,98 (ADR-0538, supersedes the ADR-0534 streaming defaults). The JSON output stamps schema_version: 2 and carries one row per (codec, target_vmaf) pair, plus an optional bisect_samples list per row that records every encode+score probe the underlying target-VMAF bisect computed. vmaf-tune report --compare-json sweep.json then renders a per-codec rate-quality curve assembled from those probes, with the picked-CRF rows highlighted as larger circled markers and the pareto frontier (lowest bitrate at each target) drawn as a heavier dashed overlay. Current HTML/Markdown profile cards also include a Quick takeaways block before the charts, short "how to read this" notes, report-local codec identity chips (text badges that link to the upstream codec or vendor project), per-codec failure status, and an embedded encoder_profile JSON payload. The takeaways spell out the smallest successful row at each target, failed/unavailable row count, ladder span, and per-shot CRF spread so non-expert readers do not have to infer the main result from the chart alone. The profile payload is intentionally machine-readable: vmaf-tune encode-profile can read the HTML, Markdown, or raw JSON and turn one selected recommendation into a concrete FFmpeg encode.

# Out of the box: 5-point sweep, 3-codec compare, GPU scoring.
vmaf-tune compare \
    --src bbb_1080p_60fps.mp4 \
    --width 1920 --height 1080 --framerate 60 \
    --encoders libx265,libsvtav1 \
    --sample-clip-seconds 3 --max-iterations 3 \
    --score-backend cuda \
    --format json --output sweep.json
vmaf-tune report \
    --src bbb_1080p_60fps.mp4 \
    --compare-json sweep.json --target-vmaf 92 \
    --format html --output sweep_report.html

# Convenience path: render the same profile card directly from compare.
vmaf-tune compare \
    --src bbb_1080p_60fps.mp4 \
    --width 1920 --height 1080 --framerate 60 \
    --encoders libx265,libsvtav1,av1_nvenc,av1_qsv \
    --sample-clip-seconds 3 --max-iterations 3 \
    --score-backend cuda \
    --format both --output sweep_profile.html

# Reuse one recommendation from that profile without re-running the sweep.
vmaf-tune encode-profile \
    --profile sweep_profile.html \
    --src bbb_1080p_60fps.mp4 \
    --codec libsvtav1 --target-vmaf 96 \
    --output bbb_svtav1_vmaf96.mkv

Why these defaults (ADR-0538 supersedes ADR-0530)

94,96,97,98 covers premium-archival. The fork's primary user encodes archival masters at VMAF >= 95 exclusively; VMAF 94 is the subjectively-transparent floor on 4K source, 98 is the near-lossless ceiling. The previous ADR-0530 / ADR-0534 default (75,80,85,90,93) targeted streaming / broadcast workflows that this fork does not service, so its R-Q chart contained no points the user picks CRFs from.
VMAF >= 95 is reachable. The earlier "top stops at 93" caveat was a bisect-harness artefact: the search window defaulted to the codec adapter's narrow quality_range (e.g. libx265 = (15, 40), libsvtav1 = (20, 50)) which the adapter validator additionally enforced as a hard CRF gate. ADR-0538 widens the default search window to the encoder's absolute CRF range — see the High-VMAF bisect contract subsection below for the per-codec table and the contract callers can rely on.
The chart renders from bisect_samples, not from the picked-CRF cells. Connecting picked-CRF rows per codec across targets produced physically impossible downward dips, because the bisect overshoots each target by a different amount. Plotting every probe the bisect already computed shows the genuine per-codec R-Q curve.

High-VMAF bisect contract (ADR-0538)¶

When crf_range is left at its default (the CLI default; no --crf-min / --crf-max passed), bisect_target_vmaf searches the encoder's absolute CRF range, not the codec adapter's perceptually-informative quality_range:

Codec	Absolute CRF range	Informative `quality_range` (legacy default)
`libx264`	`0..51`	`0..51` (already maximal)
`libx265`	`0..51`	`15..40`
`libvpx-vp9`	`0..63`	`0..63` (already maximal)
`libaom-av1`	`0..63`	`0..63` (already maximal)
`libsvtav1`	`0..63`	`20..50`
Other (`_nvenc`, `_qsv`, `_amf`, `_videotoolbox`, `libvvenc`)	falls back to `adapter.crf_min/crf_max` then `adapter.quality_range`	as declared by the adapter

The contract guarantees:

The bisect starts at the encoder's accepted floor. For libx264 / libx265 / libvpx-vp9 / libaom-av1 / libsvtav1 the lowest probe is CRF 0 (lossless), so any reasonable source produces VMAF >= 98 at that CRF and high targets are reachable.
max_iterations >= 6 covers the widest window. ceil(log2(64)) = 6; the CLI default is --max-iterations 8 (+2 safety). For a target near the top of the VMAF scale (>= 97) callers running the bisect programmatically should keep at least 6 iterations.
Overshoot at the floor is OK. If the codec already overshoots the target at CRF 0 (e.g. CRF 0 gives VMAF 99.5 on a low-distortion source against target 96), the bisect narrows toward higher CRFs looking for the highest CRF that still clears the target and returns that one with ok=True. The achieved VMAF will be >= target_vmaf by construction.
The adapter's narrow informative window is bypassed for CRF validation but not for preset validation. Preset names are still checked against the adapter's whitelist; CRFs are checked against the encoder absolute range. Pass --crf-min / --crf-max explicitly to recover the historical narrow-window behaviour.

Single-target legacy (v1, back-compat): pass --target-vmaf NN without --target-vmafs to fall back to the v1 single-target schema (one row per codec at the single target, bar+dot chart). A _TrackedDefaultAction sentinel distinguishes "user passed --target-vmaf explicitly" from "argparse default" so legacy single-target scripts keep working unchanged.

Hardware encoders (*_nvenc, *_qsv, *_amf) are availability- probed before dispatch: a two-stage probe greps ffmpeg -encoders for the encoder name (catches "not compiled into this ffmpeg build") and then runs a 1-frame lavfi nullsrc dummy encode (catches "no compatible GPU at runtime"). Encoders that fail either stage surface as ok=false rows with a stable hardware encoder not available: … error string — the renderer flags them visually and the sweep continues; the run does not abort.

The --encoders flag is optional; when omitted it defaults to the CPU set libx265,libsvtav1. Older archival smoke runs also swept libx264 and libvpx-vp9, but those two CPU lanes dominate wall time without covering the fork's current HEVC/AV1 decision points.

BBB v9 artifacts are retained only as historical bug evidence; do not use v9 as a current run baseline. New BBB compare probes should start from the ADR-0641 profile-report path (--format both) and list libx264 / libvpx-vp9 only for an explicit legacy comparison.

`compare` CLI flags¶

Flag	Default	Notes
`--src PATH`	—	Required. Single reference clip.
`--target-vmaf F`	`92.0`	Single VMAF target. When passed explicitly and `--target-vmafs` is at its default sweep, the v1 single-target schema is emitted (ADR-0534 back-compat).
`--target-vmafs LIST`	`94,96,97,98` (ADR-0538, supersedes ADR-0534)	Comma-separated VMAF targets to sweep per codec. The default covers premium-archival operating points (4K archival masters at VMAF 94-98). Pass a single value to opt into the v1 path; pass an explicit multi-value list (e.g. `80,85,90`) to override the sweep range.
`--encoders LIST`	`libx265,libsvtav1`	Comma-separated codec names. Hardware encoders (`h264_nvenc`, `hevc_nvenc`, `av1_nvenc`, `h264_qsv`, `hevc_qsv`, `av1_qsv`, `h264_amf`, `hevc_amf`, `av1_amf`) are accepted; missing-encoder rows skip with a reason rather than failing the whole run.
`--width / --height`	—	Required for the default real-bisect backend.
`--pix-fmt`	`yuv420p`	Source pixel format forwarded to the scorer.
`--framerate`	`24.0` (or auto-probed for container `--src`)	Source framerate. Container sources (mp4 / mkv / mov / Y4M / ...) auto-probe via `ffprobe` when this flag is left at its default; explicit values still win with a stderr-warning on probed-vs-user mismatch (ADR-0509).
`--duration`	`0.0` (or auto-probed for container `--src`)	Source duration in seconds, used for bitrate math. Same auto-probe semantics as `--framerate` for container sources (ADR-0509).
`--sample-clip-seconds`	`0.0`	`0.0` scores the full source. Positive values shorter than `--duration` use a centre sample window for encode, score, and bitrate math (ADR-0301).
`--preset`	adapter default	Preset forwarded to the codec adapter.
`--crf-min / --crf-max`	adapter range	Inclusive CRF search window. Pass both or neither.
`--max-iterations`	`8`	Encode+score round-trip cap per codec.
`--vmaf-model`	`vmaf_v0.6.1`	VMAF model forwarded to the scorer.
`--score-backend`	scorer default	`cpu`, `cuda`, `sycl`, `hip`, or `auto`. (`vulkan` removed in ADR-0726.)
`--ffmpeg-bin / --vmaf-bin`	`ffmpeg` / `vmaf`	Binary overrides.
`--encoder-ffmpeg-bin ENCODER=PATH`	off	Bind one compare token to a specific FFmpeg binary. Use with `ADAPTER@VARIANT` labels such as `libsvtav1@svt-av1-hdr=/opt/ffmpeg-8.1.1-svtav1-hdr/bin/ffmpeg`; unbound tokens use `--ffmpeg-bin`.
`--format`	`markdown`	One of `markdown`, `json`, `csv`, `html`, `both`. `html` and `both` render the profile-card report directly; `both` writes `.html` and `.md` next to `--output` and therefore requires `--output`.
`--no-parallel`	off	Run codecs sequentially (default: thread pool, one per codec).
`--max-workers N`	`len(encoders)`	Cap on the parallel thread pool.
`--predicate-module MOD:FN`	off	Advanced hook that bypasses the bisect backend.
`--no-bisect`	off	Switch to CRF-sweep mode: skip target-VMAF bisect and encode each (codec, CRF) pair from `--crf-sweep` exactly once. See CRF sweep mode below. (ADR-0542)
`--crf-sweep LIST`	—	Comma-separated CRF values to use in `--no-bisect` mode. Example: `18,23,28,33`. Required when `--no-bisect` is passed. (ADR-0542)
`--workdir PATH`	`None`	Directory under which to create the per-run temporary scratch directory for the decoded reference YUV and encodes. Overrides `VMAFTUNE_WORKDIR`. When unset, falls through to `VMAFTUNE_WORKDIR` (if set), then to the OS default (`/tmp`). Pass this when your source is large and `/tmp` is a small `tmpfs` (e.g. 634-second 1080p60 BBB decodes to ~118 GB raw YUV). (ADR-0598)
`--max-concurrent-decodes N`	`1`	Maximum number of reference-YUV decode operations that may run simultaneously across all codec bisect threads. Default `1` (serial decodes) caps peak workdir disk usage to one YUV at a time regardless of how many codecs run in the thread pool. For example, with 3 codecs and a 110 GB BBB source, the default prevents the 330 GB peak that caused the v13 ENOSPC failure — peak stays at 110 GB. Raise to `N` on hosts where `--workdir` points to a volume with sufficient free space and the I/O subsystem can sustain N parallel decode streams. Encoder runs are always parallel; only the decode-to-raw-YUV step is serialised at the default. (ADR-0577)
`--output PATH`	stdout	Write the rendered report to PATH instead of stdout.

`compare` output schema¶

The JSON / CSV columns are exported as vmaftune.compare.COMPARE_ROW_KEYS: codec, adapter, runtime_variant, ffmpeg_bin, encoder_version, best_crf, bitrate_kbps, encode_time_ms, vmaf_score, target_vmaf, ok, error. Failed rows trail successful ones in the ranking; ok=False rows carry a human-readable error and sentinel numerics (-1 for best_crf, NaN for the floats). adapter, runtime_variant, and ffmpeg_bin are provenance fields for ADAPTER@VARIANT compare runs; they are empty on rows produced by old programmatic predicates that do not bind a runtime variant.

v1 (single-target legacy): emitted when --target-vmafs is not passed. The JSON has no schema_version key; rows is one row per codec at --target-vmaf.

v2 (multi-target sweep, ADR-0516 + ADR-0534 + ADR-0538): emitted when --target-vmafs lists ≥ 2 targets (the new default, ADR-0538). The JSON carries "schema_version": 2, "target_vmafs": [94.0, 96.0, 97.0, 98.0], and rows is one row per (codec, target_vmaf) pair. Each row also carries an optional bisect_samples: [{crf, bitrate_kbps, vmaf_score, encode_time_ms}, ...] list (ADR-0530, additive) recording every encode+score probe the underlying bisect computed; the field is absent on old v2 dumps pre-dating ADR-0530.

vmaf-tune report detects the v1 vs v2 discriminator via the schema-version key (or the presence of target_vmafs) and picks the right chart: v1 renders the bar+dot chart, v2 renders the per-codec rate-quality curve. When bisect_samples is populated the chart plots every probe per codec (deduplicated by CRF, sorted by bitrate) and overlays the picked-CRF rows as larger circled markers; pareto frontier stays as the dashed overlay. When bisect_samples is absent (old v2 dump or v1) the chart falls back to the legacy connect-the-dots line with a caveat note in the title — the dots- connect-the-dots representation can show physically impossible downward dips when the bisect's per-target overshoot varies, which is the failure mode ADR-0530 fixes for the new path.

The CSV emitter intentionally drops the bisect_samples column (extrasaction="ignore" on the writer); it stays a JSON-only structured field.

Encode-time normalisation: the encode_time_ms column is wall-clock on whatever machine ran the predicate. Cross-codec time comparisons only make sense when every predicate was run on the same hardware in the same configuration — see Research-0061 Bucket #7.

CRF sweep mode (`--no-bisect`)¶

When --no-bisect is passed together with --crf-sweep, compare skips the target-VMAF bisect entirely and instead encodes each (codec, CRF) pair exactly once. This is useful for operators who want to sweep a fixed CRF ladder (e.g. 18, 23, 28, 33) and inspect the resulting (bitrate, VMAF) pairs — faster than running four separate bisect sweeps and more direct than the corpus pipeline.

vmaf-tune compare \
    --src clip.mp4 \
    --encoders libx264,libx265,libsvtav1 \
    --no-bisect --crf-sweep 18,23,28,33 \
    --duration 60 --sample-clip-seconds 30 \
    --format json --output cmp_sweep.json

3 codecs × 4 CRFs = 12 rows in cmp_sweep.json.

v3 (CRF-sweep, ADR-0542): emitted when --no-bisect is set. The JSON carries "schema_version": 3, "mode": "crf_sweep", "crf_sweep": [18, 23, 28, 33], and rows is one row per (codec, crf) pair. Each row carries codec, adapter, runtime_variant, ffmpeg_bin, crf, bitrate_kbps, vmaf_score, encode_time_ms, encoder_version, ok, and error. The --target-vmaf / --target-vmafs flags are accepted but act as label-only knobs in this mode (they annotate pareto frontier markers when rendered, but do not drive the encode loop). CRF-sweep mode currently requires --format json; render the result with a separate downstream report step once v3 ingestion lands.

Hardware encoder availability is probed the same way as the bisect path: unavailable encoders (e.g. h264_nvenc without an NVIDIA device) produce ok=false rows for each CRF rather than aborting the entire run.

HDR-aware tuning (Bucket #9, ADR-0300)¶

Phase A auto-detects HDR sources and injects codec-appropriate HDR encode flags + HDR VMAF scoring. Detection runs ffprobe against each --source once at corpus start; the per-source encode argv gets the resulting HDR flag set appended.

What gets detected¶

A source is classified as HDR iff its first video stream carries both of:

color_transfer ∈ {smpte2084 (PQ), arib-std-b67 / hlg} and
color_primaries ∈ {bt2020, bt2020nc, bt2020-ncl, bt2020c, bt2020-cl}.

Mismatched signaling (e.g. PQ transfer with BT.709 primaries) is treated as SDR — misclassifying SDR as HDR is the dangerous failure mode. Mastering-display + max-CLL SEI side data is read when present and propagated to encoders that expose stable FFmpeg SEI flags (x265, SVT-AV1, HEVC NVENC).

Detection modes¶

Mode	When to use
`--auto-hdr` (default)	Mixed corpora; let ffprobe classify each source.
`--force-sdr`	Disable HDR injection entirely (override probe).
`--force-hdr-pq`	Raw YUV refs with no container metadata; you know the source is PQ.
`--force-hdr-hlg`	Same, for HLG.

The four flags are mutually exclusive.

Codec dispatch¶

Encoder	HDR signaling carrier
`libx264`	Container-level `-color_*` flags only (x264 has no in-stream HDR SEI).
`libaom-av1`	Global `-color_*` tags only; no fork-owned private SEI mapping yet.
`libx265`	Global `-color_*` + `-x265-params colorprim=bt2020:transfer=...:colormatrix=bt2020nc[:master-display=...:max-cll=...:hdr10-opt=1]`.
`libsvtav1`	Global `-color_*` + `-svtav1-params color-primaries=9:transfer-characteristics=16` (PQ) or `=18` (HLG) `:matrix-coefficients=9`.
`hevc_nvenc`	`-pix_fmt p010le -profile:v main10` + global `-color_*` + `-master_display` / `-max_cll` (when ffmpeg supports them).
`av1_nvenc`	`-pix_fmt p010le` + global `-color_*` tags.
`hevc_qsv` / `hevc_amf` / `hevc_videotoolbox`	`-pix_fmt p010le -profile:v main10` + global `-color_*` tags.
`av1_qsv` / `av1_amf`	`-pix_fmt p010le` + global `-color_*` tags.
`libvvenc`	Global `-color_*` only (SEI options live behind `--vvenc-params` in newer ffmpeg builds).

Encoders not in the dispatch table emit no HDR flags and the corpus row's hdr_* fields still record the detection result.

HDR VMAF scoring (model-port slot)¶

vmaftune.hdr.select_hdr_vmaf_model(model_dir, transfer="pq"|"hlg") resolves an HDR-trained model JSON via a two-stage lookup:

canonical filename — model/vmaf_hdr_v0.6.1.json (the Netflix research artefact name); preferred when transfer is "pq" or "hlg".
glob fallback — model/vmaf_hdr_*.json (so future revisions can land without code changes).

The fork does not ship the JSON in this PR. Verified 2026-05-08 against Netflix/vmaf master model/: no vmaf_hdr_*.json is present in the upstream public tree; Netflix publishes the artefact in a separate research bundle outside the repo. A fork-local license review is the gating follow-up (ADR-0300 § Status update 2026-05-08). Until then, HDR sources are scored against the SDR model with a one-shot warning logged on the first miss (subsequent misses stay quiet). Resulting vmaf_score values trend low for high-luminance regions and are not directly comparable to SDR scores. Drop a licensed copy at model/vmaf_hdr_v0.6.1.json and the harness picks it up automatically — no code change required.

Content-addressed cache¶

Re-running a corpus sweep after adjusting an unrelated flag should not re-encode and re-score tuples that have not changed. The content-addressed cache turns repeated (src, encoder, preset, crf) combinations into a free hit on the second run, restoring the parsed (bitrate, vmaf, encode_time, score_time) tuple from disk and skipping both subprocess calls. See ADR-0298 for the design.

Key composition¶

The cache key is sha256 of the canonical-JSON-encoded six-tuple:

src_sha256 — content hash of the reference YUV
encoder — adapter name (libx264, …)
preset — encoder preset string
crf — quality knob value (int)
adapter_version — bumps when the codec adapter's argv shape changes
ffmpeg_version — host ffmpeg version string

Dropping any one of these would let stale entries shadow real results when the adapter or ffmpeg is upgraded — the test suite asserts each field flips the key.

Layout¶

The cache lives at $XDG_CACHE_HOME/vmaf-tune/ (or ~/.cache/vmaf-tune/ if the env var is unset). Override with --cache-dir. Layout:

<cache-dir>/
  meta/<key>.json     — parsed (bitrate, vmaf, encode_time, ...) tuple
  blobs/<key>.bin     — opaque encoded artifact
  __index__.json      — last-access timestamps for LRU eviction

Eviction¶

LRU with a default 10 GiB cap (configurable via --cache-size-gb). On every put, the oldest entries are dropped until the total on-disk size sits at or below the cap.

Disabling¶

Pass --no-cache to force a re-encode/re-score on every cell. The cache is also automatically skipped when --no-source-hash is active (no stable content key) or when ffmpeg -version cannot be probed before the run.

Caveats¶

The cache is not baked into the JSONL row; the row stays the canonical record, the cache is an opaque sidecar.
Cache hits do not write a synthetic encode_path — that field remains empty unless --keep-encodes is set.
Concurrent runs against a shared cache dir (e.g. NFS) work for reads; writes are last-writer-wins and both writers' bytes are valid by content addressing.

vmaf-tune corpus --source ref.yuv \
    --width 1920 --height 1080 --framerate 24 --duration 10 \
    --preset medium \
    --coarse-to-fine --target-vmaf 92 \
    --output corpus.jsonl

--crf is no longer required — the CRF axis is generated by the search. --target-vmaf is optional here; without it the search still runs both passes and refines around the highest-VMAF coarse point.

`recommend` — pick a CRF for a quality target¶

The recommend subcommand always runs coarse-to-fine and prints the single recommended (preset, crf) pair plus its measured VMAF. It also writes the visited points to --output so callers have the corpus row for downstream analysis.

vmaf-tune recommend \
    --source ref.yuv \
    --width 1920 --height 1080 --framerate 24 --duration 10 \
    --preset medium \
    --target-vmaf 92
# stdout: src=ref.yuv preset=medium crf=27 vmaf=92.341 (visited 15 encodes)

Tunables¶

Flag	Default	Notes
`--coarse-to-fine`	off (corpus); on (recommend)	Activate the 2-pass search.
`--coarse-step N`	`10`	Step for the coarse pass. With defaults gives `[10, 20, 30, 40, 50]`.
`--fine-radius R`	`5`	±R around best-coarse for the fine pass.
`--fine-step S`	`1`	Step for the fine pass.
`--target-vmaf V`	unset	Required for `recommend`; optional for `corpus`.

Timing comparison¶

Numbers below are illustrative — actual encode + score wall time per point varies with source resolution, preset, and the libvmaf backend (cpu/cuda/sycl/hip). The relevant ratio is points visited, not seconds:

Mode	Points visited	Relative wall time
Full grid `--crf 0 ... 51`	52	1.00× (baseline)
Coarse-to-fine, defaults, target met mid-range	15	~0.29× (3.46× faster)
Coarse-to-fine, 1-pass shortcut (target met at coarse max)	5	~0.10× (10.4× faster)
Coarse-to-fine, target unmet (full fine pass anyway)	15	~0.29×

For a 1080p --preset medium clip where one (encode + score) pass takes ~5 s, the coarse-to-fine path drops a single recommend run from ~260 s to ~75 s.

What Phase A does not do¶

No target-VMAF bisect (Phase B).
No per-title CRF prediction (Phase C).
No Pareto ABR ladder generation (Phase E).

Per-title ladder (Phase E)¶

Phase E ships the vmaf-tune ladder subcommand — given one source, sample (resolution × target-VMAF) points, take the Pareto upper-convex hull on (bitrate, vmaf), pick n evenly-spaced rungs along the hull, and emit the result as an HLS master playlist, DASH MPD, or JSON descriptor. This is the "per-title encoding" loop in one command — a fixed authoring-spec ladder is replaced by the ladder that's actually optimal for this title.

See ADR-0295 for the design and the alternatives considered (geometric ladder, JND- spaced, fixed Apple HLS).

The default sampler is wired: for each (resolution, target_vmaf) cell it runs the canonical 5-point CRF sweep 18,23,28,33,38 through the normal Phase A encode-and-score path, picks the closest row to the target, and feeds those points into hull and knee selection. A custom sampler= callback remains supported for callers that want a finer grid, a bisect loop, or a precomputed corpus stream.

Container and Y4M sources¶

--src accepts any input ffmpeg can decode: raw .yuv, .y4m, or a container (.mp4, .mkv, .webm, …). The ladder, corpus, and bisect paths transparently decode the reference once per sweep into a .ref.decoded.yuv sidecar under the encode dir and reuse it across every cell — there is no need to pre-decode the source by hand (ADR-0499). When --duration is set, the reference decode is clamped to that window so a 10-second probe against a multi-minute source produces a bounded YUV instead of materialising the full file (ADR-0498).

The libvmaf CLI itself reads .yuv only when --width / --height / --pixel_format / --bitdepth are passed (which vmaf-tune always does); .y4m is treated as raw planar YUV the same way .mp4 is, so the wrapper decodes both.

Canonical 5-rung invocation¶

The default rendition set is the canonical 5-rung 1080p/720p/480p/360p/240p ladder against VMAF targets {95, 90, 85, 75, 65}:

vmaf-tune ladder \
    --src episode01.yuv \
    --encoder libx264 \
    --resolutions 1920x1080,1280x720,854x480,640x360,426x240 \
    --target-vmafs 95,90,85,75,65 \
    --quality-tiers 5 \
    --format hls \
    --output episode01_ladder.m3u8

The output is an HLS master playlist with one #EXT-X-STREAM-INF per rung; bandwidth (in bps) is monotonically increasing. Variant URIs are placeholders — re-point them at your per-rendition playlists when packaging the encoded segments.

Other manifest formats¶

# DASH MPD
vmaf-tune ladder --src ep01.yuv --format dash --output ladder.mpd

# JSON descriptor (machine-readable, vmaf-tune-ladder/v1 schema)
vmaf-tune ladder --src ep01.yuv --format json --output ladder.json

The JSON descriptor carries three top-level fields:

schema — schema identifier ("vmaf-tune-ladder/v1").
renditions[] — the post-hull rungs that select_knees picked. Ascending-bitrate order; each entry has width, height, bitrate_kbps, bandwidth_bps, vmaf, crf.
samples[] — every encoded (resolution, crf) row the sampler scored, pre-hull. Ascending by (pixel_count, bitrate_kbps); same per-entry shape as renditions[]. Added 2026-05-18 per ADR-0501 so vmaf-tune report --ladder-json can render the Pareto-cloud overlay; widened 2026-05-18 per ADR-0505 from one-row-per-target-cell (V4 emit shape) to the full per-CRF sweep — every CRF encoded by the sampler now lands in the array exactly once, de-duplicated by (width, height, crf). The array is always present — callers running with no scored cells get an empty list rather than a missing key.

Cross-resolution scoring against container sources¶

Prior to ADR-0501 the reference leg of a cross-resolution rung was decoded at the source's native geometry while the libvmaf CLI was told to read both legs at the rung target. A 1920x1080 reference was therefore mis-parsed as a 1280x720 frame and emitted a catastrophic VMAF (~21 instead of ~93), collapsing the post-hull ladder to a single rendition. The corpus / ladder paths now downscale the reference YUV sidecar to the rung target via a per-rung -vf scale=W:H filter on the ffmpeg decode call. The per-rung sidecar's filename embeds the target dims (<src>.ref.decoded.<W>x<H>.yuv) so a multi-rung sweep in the same encode_dir doesn't collide on a stale path. Single-resolution ladders and rungs that match the source geometry keep the legacy decode-at-native-geometry path — there's no decode overhead when the target already matches.

ADR-0505 closes the matching gap on the encode side. Before 2026-05-18 the encode driver passed -f rawvideo -pix_fmt yuv420p -s WxH -i src.mp4 against every source, re-interpreting a container's compressed bytes as planar YUV pixels and producing a uniformly-bogus ~50 Mbps encode with VMAF in the 4-9 band regardless of CRF. The corpus now detects container sources by suffix (anything outside _VMAF_RAW_SUFFIXES = {".yuv", ""}) and sets EncodeRequest.source_is_container=True, so ffmpeg auto-detects the format and the rung-target -vf scale=W:H filter handles the resolution change. Raw .yuv sources keep the legacy rawvideo framing.

Rung spacing¶

--spacing log_bitrate (default) doubles bandwidth per rung — Apple HLS authoring-spec convention. --spacing vmaf spaces rungs by equal VMAF gaps, matching how viewers perceive quality steps. uniform is accepted as a legacy alias for vmaf.

Phase E ladder CLI flags¶

Flag	Default	Notes
`--src PATH`	—	Required. Source label (sampling currently mocked).
`--encoder NAME`	`libx264`	Codec adapter (Phase A wires `libx264` only).
`--resolutions WxH,...`	`1920x1080,1280x720,854x480,640x360,426x240`	Canonical 5-rung.
`--target-vmafs F,...`	`95,90,85,75,65`	VMAF targets per resolution.
`--quality-tiers N`	`5`	Rungs to pick from the Pareto hull.
`--spacing`	`log_bitrate`	`log_bitrate` (HLS spec) or `vmaf` (perceptual); `uniform` is a legacy alias for `vmaf`.
`--format`	`hls`	`hls`, `dash`, or `json`.
`--with-uncertainty`	off	Apply the ADR-0279 prune/insert recipe. Sampled `vmaf_interval` payloads win; point-only rows use the active wide-interval threshold as a conservative fallback.
`--uncertainty-sidecar PATH`	default thresholds	Calibration sidecar for the uncertainty recipe.
`--rung-overlap-threshold F`	`0.5`	Adjacent-rung interval overlap threshold for pruning.
`--output PATH`	stdout	Manifest destination.
`--src-width INT`	largest `--resolutions` entry	Actual source width for raw YUV cross-resolution ladders. When the source is a higher resolution than the smallest rung, this is the demuxer-side `-s W:H`; the encode pipe scales to each rung target via `-vf scale=W:H`. Container sources auto-detect geometry and ignore this flag. Added 2026-05-18 per ADR-0498 / Bug #v2-B.
`--src-height INT`	largest `--resolutions` entry	Companion to `--src-width`. Default picks the tallest entry in `--resolutions` so a `--resolutions 1920x1080,1280x720,854x480` ladder against a 1080p raw YUV "just works".
`--score-backend NAME`	`auto`	libvmaf scoring backend used by the default corpus sampler. Accepts `auto\\|cpu\\|cuda\\|sycl\\|hip` (`vulkan` was removed in ADR-0726 — same enum as `corpus --score-backend` / `compare --score-backend`). `auto` picks the fastest available in native-first order (`cuda > sycl > hip > cpu`); a specific name is honoured strictly and the run errors out with RC=2 before any encodes start if the local `vmaf` binary does not advertise it. Use `cpu` to force bit-exact CPU scoring for verification against the Netflix golden gate. Added 2026-05-18 per ADR-0511 / Bug C; HIP and native-first order added by ADR-0667.
`--vmaf-bin PATH`	`vmaf`	Path to the `vmaf` binary used to probe backend availability for `--score-backend`. Added 2026-05-18 per ADR-0511 / Bug C.
`--workdir PATH`	`None`	Directory under which to create the per-run temporary scratch directory. Overrides `VMAFTUNE_WORKDIR`. See `compare --workdir` for full semantics. (ADR-0598)
`--max-concurrent-decodes N`	`1`	Accepted for consistency with `compare` and `tune-per-shot` (ADR-0577). Currently a no-op for `ladder` because the corpus sampler does not use the bisect decode path; effective when the ladder sampler is updated to bisect in a future PR.

Cross-resolution ladders against raw YUV: prior to ADR-0498 the default sampler used the rung target dims as the source dims, which corrupted every encode against a raw YUV source whose actual resolution differed from the requested rung (-s 1280x720 on 1080p bytes = decoded garbage). The ladder now accepts separate source dims and injects a -vf scale=W:H filter for each sub-source rung. Container (.mp4 / .mkv) sources are unaffected — ffmpeg auto-detects their geometry. ADR-0501 closes the corresponding gap on the reference leg: a per-rung .ref.decoded.<W>x<H>.yuv sidecar is now produced so the libvmaf CLI reads both legs at the same geometry.

`report` subcommand — stdout aggregate flags¶

The report subcommand renders a profile-card (HTML / Markdown) from one or more JSON dumps emitted upstream of it (--compare-json, --ladder-json, --per-shot-json). It also writes a one-line stdout JSON summary that downstream automation can pipe to jq. The rendered card starts with a Quick takeaways block that summarizes the concrete recommendation and coverage gaps before the detailed tables. The stdout fields:

Key	Meaning
`ok`	`true` when at least one codec row succeeded and no non-`ok` row is a real encode failure. An "encoder unavailable" row (infrastructure gap — codec not built into ffmpeg) does not flip this to `false`. With no codec rows the report is informational and stays `ok=true`.
`degraded`	`true` when at least one codec row is an "encoder unavailable" row. Lets dashboards show the missing codec without flipping the run red. Added 2026-05-18 per ADR-0501 / Bug #V4-C.
`codec_rows` / `codec_rows_ok` / `codec_rows_failed`	Total / succeeded / failed row counts.
`codec_rows_unavailable`	Subset of `codec_rows_failed` whose `error` starts with `"encoder unavailable"`. Added 2026-05-18 per ADR-0501.
`ladder_samples` / `ladder_rungs`	Counts read from `--ladder-json`. `ladder_samples` reads the top-level `samples[]` array (always present in `vmaf-tune-ladder/v1` JSON since ADR-0501).
`shots`	Count of per-shot rows read from `--per-shot-json`.
`outputs`	Paths of the rendered card files.

The "encoder unavailable" discrimination keys on the bisect-stage error prefix added in ADR-0498 ("encoder unavailable (NAME): …"). A row whose error does not start with that prefix is treated as a genuine encode/score failure and flips ok=false.

`encode-profile` subcommand — reuse report recommendations¶

Every HTML, Markdown, and raw JSON profile card embeds encoder_profile with schema vmaftune.encoder_profile.v1 (ADR-0643). The payload contains source metadata, tool/binary provenance, codec metadata, failures, and a sorted list of concrete encoder recommendations. encode-profile reads that payload and runs exactly one selected FFmpeg encode; it never encodes the full ladder or every codec by default.

# Inspect the exact FFmpeg argv first.
vmaf-tune encode-profile \
    --profile sweep_profile.html \
    --src bbb_1080p_60fps.mp4 \
    --codec libsvtav1 \
    --target-vmaf 96 \
    --output bbb_svtav1_vmaf96.mkv \
    --dry-run

# Run the encode once the selected row looks right.
vmaf-tune encode-profile \
    --profile sweep_profile.html \
    --src bbb_1080p_60fps.mp4 \
    --codec libsvtav1 \
    --target-vmaf 96 \
    --output bbb_svtav1_vmaf96.mkv \
    --extra-ffmpeg-arg=-movflags --extra-ffmpeg-arg=+faststart

Selection rules:

With no filters, the first pareto-selected row with the lowest bitrate is used.
--codec NAME narrows the candidate list to one adapter token (libsvtav1, libx265, av1_nvenc, ...).
--target-vmaf F narrows by the exact target stored in the profile.
--recommendation-index N picks the zero-based row after those filters, useful when several rows remain tied or intentionally comparable.

Input handling:

Flag	Default	Notes
`--profile PATH`	—	Required. Accepts report JSON, report HTML, or report Markdown.
`--src PATH`	profile source	Override when the profile was generated on another machine.
`--output PATH`	—	Required encoded output path.
`--source-kind auto\\|container\\|raw`	`auto`	`auto` treats `.yuv` / `.raw` / `.rgb` / `.gray` as raw and everything else as FFmpeg-auto-detected container input.
`--width / --height / --framerate / --pix-fmt`	profile values	Required only when the selected source is raw and the profile did not carry those fields.
`--preset`	profile row / adapter default	Override the selected row's preset.
`--duration`	profile source duration	Bounds the input-side encode window.
`--sample-clip-seconds / --sample-clip-start-s`	`0`	Optional input-side clip flags forwarded to FFmpeg.
`--extra-ffmpeg-arg TOKEN`	none	Append one raw FFmpeg argv token after codec args. Repeat as needed; use `--extra-ffmpeg-arg=-movflags` for leading-dash tokens.
`--ffmpeg-bin PATH`	profile `ffmpeg_bin`, then `ffmpeg`	Override the FFmpeg binary.
`--dry-run`	off	Print the selected recommendation and `ffmpeg_argv` without encoding.

Phase F — multi-pass encoding (ADR-0333)¶

Phase F lights up 2-pass encoding for codecs that benefit. Default behaviour stays single-pass; opting in via --two-pass runs the encoder twice — pass 1 analyses the source and writes a stats file to a temp directory, pass 2 reads those stats to make better rate-allocation decisions.

When to use it¶

2-pass encoding pays off most clearly in target-bitrate workflows (VOD ladder generation, codec comparisons at fixed bitrate). Constant- quality (CRF) encodes already adapt QPs frame-by-frame from the encoder's lookahead, so the win at fixed CRF is more modest. Expect:

+1 to +3 VMAF points at a fixed bitrate target on libx265 vs 1-pass ABR (typical-content range; see x265 rate-control docs).
~2× encode wall time — the second pass roughly doubles the cost.

Quick start¶

vmaf-tune corpus \
  --source ref.yuv --width 1920 --height 1080 \
  --pix-fmt yuv420p --framerate 24 --duration 5 \
  --encoder libx265 --preset medium --crf 23 \
  --two-pass \
  --output corpus_2pass.jsonl

The driver materialises a per-encode stats file under tempfile.gettempdir() (e.g. /tmp/vmaftune-2pass-XXXXXX/), runs both passes back-to-back, and removes the stats file plus known encoder sidecars when the run completes — successful or not.

Codec support matrix¶

ADR-0546 closed the contract for every codec adapter. Each adapter now either runs a real two-invocation 2-pass, returns single-invocation quality-boost flags callers can splice into extra_params, or raises a typed error documenting why the encoder cannot support multi-pass.

Codec	`supports_two_pass`	`two_pass_args(1, p)` returns	Notes
`libx264`	yes	`-pass 1 -passlogfile <prefix>`	FFmpeg-native two-invocation 2-pass. ADR-0333.
`libx265`	yes	`-x265-params pass=1:stats=<path>`	x265 routes pass control through its codec-private payload. ADR-0333.
`libvpx-vp9`	yes	`-pass 1 -passlogfile <prefix>`	FFmpeg-native two-invocation 2-pass; CRF mode pinned via `-b:v 0`.
`libaom-av1`	yes	`-pass 1 -passlogfile <prefix>`	FFmpeg-native two-invocation 2-pass. ADR-0546.
`libvvenc`	yes	`-pass 1 -passlogfile <prefix>`	FFmpeg ≥ 6.1 translates `-pass` to VVenC's `RcStatsFile` config. ADR-0546.
`libsvtav1`	no	`-pass 1 -passlogfile <prefix>`	SVT-AV1 forbids multi-pass in CRF mode (verified against v4.1.0: `Svt[error]: CRF does not support multi-pass. Use single pass.`). The adapter still returns VBR-mode argv for callers that explicitly switch into bitrate-targeted mode via `extra_params`. ADR-0546.
`h264_nvenc` / `hevc_nvenc` / `av1_nvenc`	no	`-multipass fullres`	NVENC's multipass is a single-invocation, in-encoder full-resolution analysis. Splice the pass-1 return value into `EncodeRequest.extra_params` for a quality-boosted single-pass encode. Requires NVENC's VBR rate control + target bitrate. ADR-0546.
`h264_qsv` / `hevc_qsv` / `av1_qsv`	no	`-extbrc 1 -look_ahead_depth 40`	Intel QSV's extended-BRC look-ahead is a single-invocation in-encoder pre-analysis window. Compose into `extra_params` for the quality boost. ADR-0546.
`h264_amf` / `hevc_amf` / `av1_amf`	no	`-preanalysis true`	AMD AMF's pre-analysis stage runs inside a single ffmpeg invocation. Compose into `extra_params` for the quality boost. ADR-0546.
`h264_videotoolbox` / `hevc_videotoolbox` / `av1_videotoolbox` / `prores_videotoolbox`	no	`raise VideoToolboxTwoPassUnsupportedError`	Apple `VTCompressionSession` has no multi-pass C API. For true 2-pass on macOS, switch to a software encoder (`libx264` / `libx265` / `libsvtav1` / `libaom-av1` / `libvvenc`) — all of which ship in the same FFmpeg build. ADR-0546.

When --two-pass is set against a codec where supports_two_pass = False, vmaf-tune writes a one-line warning to stderr and runs single-pass. (Mirrors the saliency.py "unsupported ROI encoder, fallback to plain encode" precedent.) To fail loud instead, callers using the Python API can pass on_unsupported="raise" to run_two_pass_encode. VideoToolbox calls to adapter.two_pass_args() always raise the typed error so callers introspecting the contract can disambiguate "API limitation" from "we forgot to implement".

Hardware quality-boost composition (ADR-0546)¶

Hardware encoders expose their "2-pass equivalent" as single-invocation flags. To combine them with the harness's normal single-pass driver, pull adapter.two_pass_args(1, Path("/tmp/unused")) and splice the result into EncodeRequest.extra_params:

from vmaftune.codec_adapters import get_adapter
from vmaftune.encode import EncodeRequest, run_encode
from pathlib import Path

adapter = get_adapter("h264_nvenc")
boost = adapter.two_pass_args(1, Path("/tmp/_unused"))  # ('-multipass', 'fullres')

req = EncodeRequest(
    source=ref, width=1920, height=1080, pix_fmt="yuv420p", framerate=24.0,
    encoder="h264_nvenc", preset="slow", crf=22, output=out,
    extra_params=boost,  # quality-boosted single-pass
)
run_encode(req)

Cache interaction¶

The content-addressed encode cache (ADR-0298) keys on pass count, so a 1-pass encode and a 2-pass encode of the same (src, codec, preset, crf) are distinct cache entries — the cache will never serve a 1-pass encode for a 2-pass request.

Sample-clip composition¶

Sample-clip mode (ADR-0297) composes with 2-pass: both passes apply the same -ss <start> -t <N> input slice, and the per-encode stats file is unique per slice. No special handling required.

What Phase A / E / F do not do¶

No target-VMAF bisect (Phase B). Phase E uses the canonical 5-point CRF sweep as its production default sampler.
No per-title or per-shot CRF prediction (Phase C / D).
No real-corpus end-to-end ladder validation against a Netflix per- title baseline yet.
True two-invocation 2-pass on codecs the underlying encoder refuses (today: libsvtav1 in CRF mode; ADR-0546). The adapter still exposes meaningful argv for callers that switch into the encoder's supported mode (e.g. VBR for SvtAv1).
Encoder-stats parsing for libvpx-vp9: FFmpeg's libvpx passlog is binary first-pass data, not the x264/x265 text schema consumed by encoder_stats.py.

Phase A.5 — opt-in `fast` recommend¶

The fast subcommand (ADR-0276, Research-0060) is an opt-in recommendation surface that combines three acceleration levers — VMAF proxy via fr_regressor_v2, Bayesian search via Optuna's TPE sampler, and GPU-accelerated VMAF for the verify step — to replace the exhaustive grid for the recommendation use case. The slow corpus path stays the canonical ground truth. Production mode runs the sample encode, extracts canonical-6 features, calls the fr_regressor_v2 ONNX proxy, and performs a single verify pass through the selected VMAF backend. --smoke is still available for dependency- free CI and local plumbing checks.

Install¶

pip install -e 'tools/vmaf-tune[fast]'   # adds Optuna

The core install path stays zero-dependency; the [fast] extra is strictly opt-in.

Smoke-mode quick start¶

vmaf-tune fast --smoke --target-vmaf 92

Smoke mode runs Optuna over a synthetic x264-shaped CRF→VMAF curve. No ffmpeg, no ONNX Runtime, no GPU is touched. Output is a single JSON object:

{
  "encoder": "libx264",
  "target_vmaf": 92.0,
  "recommended_crf": 18,
  "predicted_vmaf": 92.39,
  "predicted_kbps": 4121.09,
  "n_trials": 50,
  "smoke": true,
  "notes": "smoke mode — synthetic predictor; ..."
}

CLI flags¶

Flag	Default	Notes
`--src PATH`	—	Source video. Required outside `--smoke`.
`--target-vmaf F`	`92.0`	Quality target on VMAF [0, 100] scale.
`--encoder NAME`	`libx264`	Codec adapter (Phase A.5: x264 only).
`--crf-lo N`	`10`	Lower bound of the CRF search.
`--crf-hi N`	`51`	Upper bound of the CRF search.
`--n-trials N`	`50`	Optuna TPE trial count.
`--time-budget-s N`	`300`	Soft wall-clock budget for Optuna. Completed trial count may be lower than `--n-trials` when the timeout is hit.
`--smoke`	off	Synthetic predictor — exercises the pipeline without ffmpeg / ONNX.

Speedup model¶

Per Research-0060 §Speedup model:

Combination	Speedup vs Phase A grid
Phase A grid (baseline)	1×
`fast` (proxy + Bayesian + GPU verify)	≈20–50×
`fast` + NVENC (lever C, follow-up)	≈100–500×

These are upper bounds. The production claim is gated on a recommendation-quality benchmark against the slow grid.

Production limits¶

fast is a recommendation shortcut, not a corpus generator. It still needs a representative source clip, a usable encoder, and a VMAF binary for verification. When the proxy / verify gap exceeds --proxy-tolerance, the CLI exits with code 3 so operators can fall back to the slow grid:

vmaf-tune fast --src ref.yuv --target-vmaf 92 || \
    vmaf-tune recommend --from-corpus corpus.jsonl --target-vmaf 92

Per-shot parallelisation remains a separate integration with TransNet V2 and vmaf-perShot.

fast accepts any registered codec adapter in production mode. Smoke mode remains synthetic and x264-shaped by design.

libvpx-vp9 example¶

The VP9 adapter routes --encoder libvpx-vp9 through FFmpeg's libvpx-vp9 wrapper. It maps the shared preset vocabulary to -deadline good -cpu-used N, emits -crf, and forces VP9 constant-quality mode with -b:v 0.

vmaf-tune corpus \
    --encoder libvpx-vp9 \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --preset medium --preset fast \
    --crf 28 --crf 32 --crf 36 \
    --output corpus_vp9.jsonl

--two-pass is supported for VP9 via FFmpeg's generic -pass / -passlogfile switches. Per-frame encoder-stats columns remain zero for VP9 until a binary libvpx first-pass parser lands.

x265 example¶

x265 ships ten presets (ultrafast … placebo) on the same 0..51 CRF scale as x264; the harness routes --encoder libx265 through ffmpeg's -c:v libx265 path. External ffmpeg must be built with --enable-libx265.

vmaf-tune corpus \
    --encoder libx265 \
    --source ref.yuv \
    --width 1920 --height 1080 --pix-fmt yuv420p \
    --framerate 24 --duration 10 \
    --preset medium --preset slow --preset placebo \
    --crf 23 --crf 28 --crf 34 \
    --output corpus_x265.jsonl

The 10-bit pipeline is enabled by setting --pix-fmt yuv420p10le; the adapter reports the corresponding HEVC profile (main10) via X265Adapter.profile_for(pix_fmt) for downstream consumers that need it.

The software adapters (libx264, libx265, libsvtav1, libaom-av1, libvpx-vp9, libvvenc) all route through the same codec-adapter registry; the search loop does not branch on codec name.

VVenC (H.266 / VVC)¶

vmaf-tune ships a libvvenc codec adapter (ADR-0285) that drives Fraunhofer HHI's open-source VVC encoder via FFmpeg's -c:v libvvenc wrapper. VVC is the ITU-T / ISO standard that succeeds HEVC and delivers roughly 30-50% better compression at equal quality. As a rough rule of thumb, VVenC slow is ~5-10% better quality than HEVC slower at the same bitrate but ~3-5× slower wall-clock — VVC is the "opt in to longer encodes for tighter bitrates" branch of the adapter set.

Quality knob and presets¶

Property	Value
Quality knob	`qp` (forwarded as the integer that the harness's `--crf` flag carries; VVenC's wrapper accepts the value regardless of label)
Quality range	`[17, 50]` (perceptually informative window; full VVenC scale is 0..63)
Default	`32`
Native presets	`faster, fast, medium, slow, slower` (5 levels)

The harness's canonical 7-name preset vocabulary (placebo / slowest / slower / slow / medium / fast / faster / veryfast / superfast / ultrafast) compresses onto VVenC's native 5 levels via a static map: anything strictly slower than slow pins to slower; anything strictly faster than fast pins to faster; the central three names map identically. This matches the projection rule used by the parallel HEVC and AV1 adapters so that predictor inputs stay codec-uniform.

Tuning surface (real VVenC 1.14.0 knobs)¶

The adapter exposes a curated subset of VVenC's config keys via FFmpeg's -vvenc-params key=value:key=value channel. Keys are sourced verbatim from source/Lib/apputils/VVEncAppCfg.h at tag v1.14.0 (SHA 9428ea8636ae7f443ecde89999d16b2dfc421524, accessed 2026-05-09). Every knob defaults to None (library default preserved); the search loop opts into a non-default value only when the corpus row records it.

Knob	VVenC key	Default	Effect / typical use
`perceptual_qpa`	`PerceptQPA`	library default	XPSNR-driven perceptual QPA. Materially shifts the rate-distortion curve; recorded per-row in `encoder_extra_params` for predictor conditioning.
`internal_bitdepth`	`InternalBitDepth`	library default (10 for VVC)	Force 8- or 10-bit internal precision; necessary for HDR profiles.
`tier`	`Tier`	library default (`main`)	`main` or `high`. Caps signalled max bitrate / resolution.
`tiles`	`Tiles`	single tile	`(cols, rows)` partitioning, emitted as `NxM`. Useful for parallel encode of high-resolution content.
`max_parallel_frames`	`MaxParallelFrames`	library default (auto)	Parallel-frames perf knob; `0` disables, `>=2` enables.
`rpr`	`RPR`	library default	VVC reference-picture-resampling. `0` off / `1` on / `2` RPR-ready.
`sao`	`SAO`	library default (on)	Sample Adaptive Offset loop filter. Useful for ablation studies.
`alf`	`ALF`	library default	Adaptive Loop Filter; useful for ablation.
`ccalf`	`CCALF`	library default	Cross-Component ALF (only meaningful when `alf` is on).

Toggles are emitted in field-declaration order so the argv stays byte-stable for cache-key hashing (per ADR-0298). The adapter_version field bumps to "2" for the 2026-05-09 surface — stale cached results are invalidated automatically.

NN-VC status (deferred)¶

VVC the standard defines NN-VC tool-points (NN-based intra prediction, NN-based loop filter, NN-based super-resolution), but VVenC 1.14.0 does not ship implementations of any of them. An earlier draft of this adapter exposed an nnvc_intra toggle that emitted -vvenc-params IntraNN=1; that key has never existed in any released VVenC and has been removed (see ADR-0285 §"Status update 2026-05-09"). If upstream VVenC ever lands NN-VC tools the adapter will pick them up via the placeholder pattern from ADR-0339's self-activating adapter set.

External binary requirements¶

Running the VVenC adapter end-to-end requires:

ffmpeg compiled with --enable-libvvenc on PATH (or --ffmpeg-bin).
The libvvenc shared library and headers from https://github.com/fraunhoferhhi/vvenc.

The shipped unit tests mock subprocess.run so the adapter can be exercised without either binary present; integration smoke is gated to a CI runner that has a libvvenc-enabled FFmpeg.

Phase D — per-shot CRF tuning¶

The tune-per-shot subcommand drives the Netflix-style per-shot encoding path. It cuts the source into shots (via the C-side vmaf-perShot binary, which wraps TransNet V2 — see ADR-0223), extracts each shot to a temporary raw-YUV reference, runs the Phase-B target-VMAF bisect for that shot, and emits an FFmpeg encoding plan that produces one segment per shot plus a final concat-demuxer command.

The target-VMAF predicate remains pluggable for advanced callers: --predicate-module MODULE:CALLABLE bypasses the default bisect path, and the Python API still accepts tune_per_shot(..., predicate=...). Codec emission is still portable segment-and-concat output rather than native per-shot mechanisms (--qpfile for x264, --zones for x265, the SVT-AV1 segment table).

Design rationale and the decision matrix live in ADR-0392.

Quick start¶

Container source (geometry auto-probed, no pre-extraction needed — ADR-0542):

vmaf-tune tune-per-shot \
    --src clip.mp4 \
    --target-vmaf 92 \
    --encoder libx264 \
    --output per_shot_encode.mp4 \
    --plan-out plan.json

Raw YUV source (explicit geometry required):

vmaf-tune tune-per-shot \
    --src ref.yuv \
    --width 1920 --height 1080 \
    --framerate 24 \
    --target-vmaf 92 \
    --encoder libx264 \
    --output per_shot_encode.mp4 \
    --plan-out plan.json

The plan is emitted to stdout as JSON unless --plan-out is specified. Pass --script-out plan.sh to also receive a copy-paste shell script of the per-segment + concat commands.

CLI flags¶

Flag	Default	Notes
`--src PATH`	—	Required. Source video. Accepts raw YUV (`.yuv` / `.raw`) or any container format (mp4, mkv, mov, ts, …). For container sources, `--width`, `--height`, `--framerate`, and `--total-frames` are auto-derived from ffprobe. (ADR-0542)
`--width / --height`	auto-probed	Source resolution. Required for raw YUV sources. For container sources the values are auto-derived from ffprobe when these flags are omitted. Pass explicit values to override the probe. (ADR-0542)
`--pix-fmt PFMT`	`yuv420p`	Forwarded to `vmaf-perShot`.
`--framerate F`	auto-probed	Source framerate. Auto-derived from ffprobe for container sources; defaults to `24.0` if the probe cannot determine a rate. (ADR-0542)
`--target-vmaf V`	`92.0`	Per-shot quality target.
`--encoder NAME`	`libx264`	Any registered codec adapter accepted by the Phase-B bisect backend.
`--bitdepth N`	`8`	Forwarded to `vmaf-perShot` (`8`, `10`, or `12`).
`--total-frames N`	`0`	Frame count for the single-shot fallback when `vmaf-perShot` is unavailable. Auto-derived from ffprobe for container sources. (ADR-0542)
`--scene-threshold X`	unset	Override `vmaf-perShot --diff-threshold` (mean-absolute-luma-delta cutoff for cut classification; lower = more shots). Omit to keep the C-side compiled default (`12.0` on 8-bit content). See ADR-0513.
`--max-shot-duration S`	`2.0`	Uniform-time-window splitter (seconds). Any detected shot longer than `S` is sliced into equal-length sub-shots so the per-shot tuner sees a non-degenerate timeline even when the detector under-cuts (e.g. short clips, low-contrast fades). Set to `0` to disable. See ADR-0513.
`--per-shot-bin PATH`	`vmaf-perShot`	Override the shot detector binary.
`--ffmpeg-bin PATH`	`ffmpeg`	Override the FFmpeg binary.
`--vmaf-bin PATH`	`vmaf`	Override the libvmaf CLI used by the per-shot scorer.
`--preset NAME`	adapter default	Codec preset forwarded to Phase-B bisect.
`--crf-min / --crf-max`	adapter range	Optional inclusive CRF search bounds; pass both or neither.
`--max-iterations N`	`8`	Maximum encode+score iterations per detected shot.
`--vmaf-model NAME`	`vmaf_v0.6.1`	VMAF model forwarded to the per-shot scorer.
`--score-backend NAME`	`auto`	libvmaf scoring backend for the per-shot scorer (`auto`, `cpu`, `cuda`, `sycl`, `hip`). (`vulkan` removed in ADR-0726.)
`--predicate-module SPEC`	—	Advanced hook `MODULE:CALLABLE` matching `(shot, target_vmaf, encoder) -> (crf, measured_vmaf)`; bypasses real bisect.
`--workdir PATH`	`None`	Directory under which to create the per-run temporary scratch directory. Overrides `VMAFTUNE_WORKDIR`. See `compare --workdir` for full semantics. (ADR-0598)
`--max-concurrent-decodes N`	`1`	Maximum number of simultaneous reference-YUV decode operations across all per-shot bisect threads. Default `1` (serial decodes). See `compare --max-concurrent-decodes` for full semantics. (ADR-0577)
`--output PATH`	`per_shot_encode.mp4`	Final concatenated encode destination.
`--segment-dir PATH`	see below	Directory for per-shot segment files. Priority order: (1) explicit `--segment-dir`; (2) `<plan-out>.parent/segments` when `--plan-out` is set; (3) `<output>.parent/segments`. If the resolved directory is not writable (e.g. a read-only bind-mount), a `WARN` is emitted to stderr and the command still exits 0 — the plan JSON remains the authoritative deliverable. See ADR-0532.
`--plan-out PATH`	stdout	Write the JSON plan here instead of stdout.
`--script-out PATH`	—	Optional: also emit a copy-paste shell script.

Plan JSON schema¶

{
  "encoder": "libx264",
  "framerate": 24.0,
  "predicate": "bisect",
  "target_vmaf": 92.0,
  "shots": [
    {
      "start_frame": 0, "end_frame": 24,
      "crf": 22, "predicted_vmaf": 93.0,
      "bitrate_kbps": 5234.12
    },
    {
      "start_frame": 24, "end_frame": 72,
      "crf": 26, "predicted_vmaf": 92.5,
      "bitrate_kbps": 4182.44
    }
  ],
  "segment_commands": [
    ["ffmpeg", "-y", "-hide_banner", "-ss", "0.000000", "-i", "ref.mp4",
     "-frames:v", "24", "-c:v", "libx264", "-crf", "22",
     "/tmp/segments/shot_0000.mp4"]
  ],
  "concat_command": [
    "ffmpeg", "-y", "-hide_banner", "-f", "concat", "-safe", "0",
    "-i", "/tmp/segments/concat.txt", "-c", "copy", "out.mp4"
  ]
}

start_frame is inclusive, end_frame is exclusive (Python-slice convention). The vmaf-perShot CSV/JSON sidecar uses inclusive end_frame; the planner normalises into the half-open form and the segment commands honour the half-open semantics via -frames:v.

bitrate_kbps (added in ADR-0531) carries the encoded segment bitrate measured by the Phase-B bisect backend: (segment_size_bytes × 8 / 1000) / shot_duration_s. It is null when a custom --predicate-module is used (no real encode happens in that path) and always a positive finite float for the default bisect backend. The vmaf-tune report renderer uses this field to populate the Bitrate column in the per-shot table; a null / absent value renders as "—".

Single-shot fallback¶

If the vmaf-perShot binary is not on PATH, or it exits non-zero, the planner falls back to a single shot covering the whole clip ([0, --total-frames)). This keeps tune-per-shot usable as a smoke test on machines that have not built the shot-detector binary yet.

The uniform-time-window splitter (--max-shot-duration, default 2.0 s) still applies to that fallback: even when the detector returns one giant shot, the splitter slices it into roughly ceil(duration / window) equal-length sub-shots so the per-shot tuner produces a useful CRF timeline. Pass --max-shot-duration 0 to restore the historical single-shot behaviour. See ADR-0513.

Tuning scene sensitivity¶

vmaf-perShot classifies frame N as a cut when the mean absolute luma delta against frame N-1 (in the 8-bit domain) crosses the --diff-threshold cutoff. The compiled-in default is 12.0, which the empirical calibration in ADR-0222 tuned against the testdata fixtures. Real-world content varies: animated material with deep saturated transitions trips the heuristic easily, but short live-action clips with low-contrast scene changes (BBB's underwater segment, indoor talking-heads) under-cut at the default.

Lower --scene-threshold (e.g. 4 - 6) to recover those cuts. Higher values (e.g. 18 - 25) reject motion-rich in-shot bursts that the default classifies as fades. The CLI flag passes through to the C binary verbatim so existing diagnostic scripts that invoke vmaf-perShot directly use the same units.

What Phase D does not do¶

Does not run the encodes — only emits the plan. Pipe --script-out plan.sh through sh to execute it manually.
Does not emit native per-codec per-shot mechanisms (x264 --qpfile, x265 --zones, SVT-AV1 segment tables). Per-segment encode plus concat-demuxer is the portable fallback.
Does not handle GOP-aligned shot boundaries — the per-segment approach side-steps this by re-encoding each shot from frame 0.

auto¶

vmaf-tune auto is the Phase F entry point (ADR-0325). One CLI verb composes the per-phase subcommands (corpus, recommend, predict, tune-per-shot, recommend-saliency, ladder, compare) plus the orthogonal modes (HDR auto-detect, sample-clip, resolution-aware) into a deterministic decision tree. F.1 shipped the sequential composition; F.2-F.5 added the short-circuits, confidence-aware fallbacks, and per-content recipes.

Synopsis¶

vmaf-tune auto \
    --src reference.mp4 \
    --target-vmaf 93 \
    --max-budget-bitrate 5000 \
    --allow-codecs libx264,libx265 \
    [--codec libx265] \
    [--sample-clip-seconds 10] \
    [--smoke] \
    [--output plan.json]

The non-smoke path probes source geometry, source duration, and HDR metadata through the same ffprobe/HDR helpers used by the corpus path. Probe failures degrade to conservative 1920x1080 SDR defaults so the planner can still emit a JSON plan and expose which later stages need real evidence. --smoke still exercises the composition end-to-end with mocked sub-phases (no ffmpeg, no ONNX). The JSON plan emitted under metadata.short_circuits records which short-circuits fired; post-hoc analysis uses this to measure the speedup contribution of each one. For each non-smoke cell, auto now feeds the probed metadata into the existing Predictor path, picks a codec-specific CRF for metadata.effective_predictor_target_vmaf, and records predictor estimates for estimated_vmaf and estimated_bitrate_kbps. These are planner estimates, not measured encode results, until the future realise/encode step scores the chosen cells. The per-cell prediction_source key distinguishes the production estimate path ("predictor") from the --smoke composition placeholder ("smoke-placeholder").

Short-circuits¶

The ten short-circuits below are the F.2 surface. Each one is a guarded fast-path with one trigger condition; the predicates are exposed as _should_short_circuit_<N> helpers in tools/vmaf-tune/src/vmaftune/auto.py so they can be unit-tested in isolation.

#	Identifier	Trigger	Skips
1	`ladder-single-rung`	`meta.height < 2160`	Multi-rung ABR ladder evaluation (ADR-0289 / ADR-0295).
2	`codec-pinned`	`--codec` set or `--allow-codecs` resolves to one entry	The `compare.shortlist` stage.
3	`predictor-gospel`	`predict.crf_for_target` returns `GOSPEL` (ADR-0306)	The `recommend.coarse_to_fine` fallback for that cell.
4	`skip-saliency`	`meta.content_class` is photographic / live-action (not animation / screen content)	The `recommend_saliency.maybe_apply` stage (ADR-0293).
5	`sdr-skip`	`not meta.is_hdr` (per ADR-0300 detector)	The HDR resolution + model-selection branch.
6	`sample-clip-propagate`	`--sample-clip-seconds > 0`	Re-deciding clip length per stage; the user-supplied value propagates verbatim (ADR-0301).
7	`skip-per-shot`	`duration < 5min` AND `shot_variance < 0.15`	The `tune_per_shot.refine` pass (ADR-0392).
8	`low-complexity`	`meta.complexity_score < 200 kbps` (probe-encode bitrate)	The `recommend.coarse_to_fine` sweep — the predictor's point estimate is already tight on simple content. `0.0`/`NaN` does not fire (no probe run yet).
9	`baseline-meets-target`	`meta.baseline_vmaf >= target_vmaf`	The full predictor sweep — the default-CRF encode already satisfies the quality target. `0.0`/`NaN` does not fire (no baseline scored yet).
10	`no-two-pass`	`adapter.supports_two_pass == False` (ADR-0333 + ADR-0546)	The two-pass calibration stage. Hardware encoders (`_nvenc`, `_amf`, `_qsv`, `_videotoolbox`) and `libsvtav1` (CRF-mode multi-pass prohibition) fire this. `libx264` / `libx265` / `libvpx-vp9` / `libaom-av1` / `libvvenc` set `supports_two_pass = True`.

The 5-min and 0.15 thresholds in short-circuit #7 are placeholders; F.3 fits them empirically once Phase F has emitted enough labelled compositions to make the fit statistically defensible. The constants live at the top of auto.py (PHASE_D_DURATION_GATE_S, PHASE_D_SHOT_VARIANCE_GATE) so the eventual fit lands as a one-line edit. The 200 kbps threshold in short-circuit #8 is likewise a placeholder: LOW_COMPLEXITY_PROBE_BITRATE_THRESHOLD_KBPS at the top of auto.py.

The evaluation order in SHORT_CIRCUIT_PREDICATES is part of the public contract: tests assert that an earlier-firing predicate doesn't shadow a later one whose result would have been different. Adding a new short-circuit means appending; never reordering.

auto does not dispatch the fast subcommand from inside its tree. fast (ADR-0276 fast-path) is a different operator surface (proxy + Bayesian over a single codec) and remains a sibling, not a child, of auto.

For HDR sources, every emitted cell records the codec-specific hdr_args produced by vmaftune.hdr.hdr_codec_args(codec, info). That means x264 gets only container-level -color_* flags, x265 gets -x265-params SEI signalling, SVT-AV1 gets -svtav1-params, and SDR cells record an empty list after the sdr-skip short-circuit fires.

Winner selection¶

auto now finishes the planning pass by selecting one estimated cell. The chosen cell is marked with "selected": true in cells[]; every other cell has "selected": false. The same decision is copied to metadata.winner so scripts can read a stable object without scanning the cell array.

Winner statuses:

Status	Meaning
`budget_and_quality_met`	At least one cell met `--target-vmaf` and `--max-budget-bitrate`; the lowest estimated bitrate wins.
`quality_met_budget_exceeded`	No cell was inside budget, but at least one cell met the quality target; the smallest budget overage wins.
`target_unmet`	No cell met the quality target; the closest estimated VMAF miss wins so the caller still gets a concrete next encode.
`no_eligible_cells`	No cell carried finite `estimated_vmaf` and `estimated_bitrate_kbps`; this is an input/planner evidence failure.

metadata.winner records cell_index, rung, codec, crf, estimated_vmaf, estimated_bitrate_kbps, quality_margin, and budget_margin_kbps. This is still a planning result: the selected cell is the next encode target, not a substitute for the final encode/score verification pass.

Execute mode (ADR-0454)¶

After the planning pass, --execute drives real FFmpeg encodes and libvmaf scores for the selected cell(s):

vmaf-tune auto \
    --src reference.mp4 \
    --target-vmaf 93 \
    --max-budget-bitrate 5000 \
    --allow-codecs libx264,libx265 \
    --output plan.json \
    --execute \
    --runs-dir runs/

--execute flags:

Flag	Default	Description
`--execute`	off	Enable execute mode; plan-only when absent.
`--runs-dir PATH`	`runs/`	Destination for encoded files and `tune_results.jsonl`.
`--execute-all`	off	Run every plan cell instead of only the `selected` winner.

Results are appended to <runs-dir>/tune_results.jsonl one row per executed cell. Each row carries the cell metadata (codec, preset, CRF, estimated VMAF/bitrate) merged with encode outcomes (size, encode time, FFmpeg version) and score outcomes (measured VMAF, per-feature means/stds, vmaf-binary version). The file appends on each run so partial runs and incremental re-runs do not overwrite previous results.

The CLI exits with status 1 if at least one cell was executed but none scored successfully (encode failures, vmaf binary absent, etc.); it exits 0 on plan-only runs regardless of plan content.

Per-shot execution (ADR-0468)¶

run_plan_per_shot splits the source into shot boundaries (via vmaf-perShot / TransNet V2, ADR-0223) and scores each segment independently. Results land in <runs-dir>/tune_results_per_shot.jsonl:

from vmaftune.executor import run_plan_per_shot

per_shot_results = run_plan_per_shot(
    plan, src=Path("reference.mp4"), out_dir=Path("runs/"),
    vmaf_model="vmaf_v0.6.1",
)
for r in per_shot_results:
    print(f"cell {r.row['cell_index']}: "
          f"{r.row['shot_count']} shots, "
          f"weighted VMAF={r.weighted_vmaf:.2f}")

Each top-level row carries shot_count and weighted_vmaf (frame-length-weighted mean of per-shot scores). When vmaf-perShot is absent, the call falls back to a single-shot range covering the whole clip; shot_count == 1 signals this.

Saliency execution (ADR-0468)¶

run_plan_saliency applies saliency-aware encoding via saliency_aware_encode (see § Saliency-aware encoding) before scoring. Results land in <runs-dir>/tune_results_saliency.jsonl:

from vmaftune.executor import run_plan_saliency

sal_results = run_plan_saliency(
    plan, src=Path("reference.mp4"), out_dir=Path("runs/"),
    saliency_model_path=Path("model/tiny/saliency_student_v1.onnx"),
    duration_frames=total_frame_count,
)
for r in sal_results:
    print(f"cell {r.row['cell_index']}: "
          f"saliency_available={r.saliency_available}, "
          f"VMAF={r.row['vmaf_score']}")

saliency_available in the result row is True when the ONNX model ran; False means the encoder fell back to a plain encode (model file missing or onnxruntime not installed). The encode and score always proceed regardless.

Confidence-aware fallbacks (F.3)¶

F.2 treats the predictor's verdict as a binary GOSPEL / FALL_BACK gate (short-circuit #3). F.3 makes the gate continuous by consulting the conformal interval half-width returned by Predictor.predict_vmaf_with_uncertainty (ADR-0393). Two width gates carve the half-width axis into three regions:

Interval width	Outcome	Effect on F.2
`width <= tight_interval_max_width`	`SKIP_ESCALATION`	Predictor is confident; trust the point estimate even when the native verdict said `FALL_BACK`.
`tight < width < wide`	`RECOMMEND_ESCALATION` (on `FALL_BACK` / unknown) or `SKIP_ESCALATION` (on `GOSPEL` / `LIKELY`)	Defer to the native verdict — exactly the F.2 behaviour.
`width >= wide_interval_min_width`	`FORCE_ESCALATION`	Predictor is uncertain; escalate to `recommend.coarse_to_fine` even when the native verdict said `GOSPEL`.

The two thresholds are corpus-derived. The conformal-VQA calibration pipeline ships a JSON sidecar with the canonical keys tight_interval_max_width and wide_interval_min_width; the loader honours per-corpus overrides transparently. When no sidecar is found the loader falls back to the Research-0067 emergency floor (2.0 / 5.0 VMAF) and emits a one-line warning; the floor is documented behaviour, not a magic constant.

Per-cell decisions are recorded in plan.metadata.confidence_aware_escalations[] (one entry per cell, keyed by rung, codec, verdict, interval_width, decision), and each cell in plan.cells[] carries its own confidence_decision + interval_width keys so JSON consumers don't need to cross-reference the metadata array index.

Per CLAUDE.md feedback_no_test_weakening: the thresholds are calibration outputs. If a sidecar value triggers surprising cell escalations on real data, the fix is a recalibration PR — not a loosening of the F.3 gate here.

The helper _confidence_aware_escalation(verdict, interval_width, thresholds) in tools/vmaf-tune/src/vmaftune/auto.py is exposed for unit testing and direct embedding by downstream tools (the MCP server's auto proxy, the CI corpus collector). It is a pure function of its three inputs.

Per-content-type recipes (F.4)¶

F.4 layers per-content-type recipe overrides on top of F.1+F.2+F.3. When the upstream classifier (per_shot.detect_shots plus the fork-local content-class heuristics) tags a source as animation, screen_content, live_action_hdr, or ugc, the auto driver applies a small override dict before the F.2 short-circuits evaluate so a recipe can flip force_single_rung and have the ladder stage honour it. Any source whose meta.content_class doesn't match a named recipe falls through to the empty default recipe.

The override keys consumed by the driver are:

Key	Type	Effect
`tight_interval_max_width`	`float`	Narrows / widens the F.3 conformal-tight gate.
`force_single_rung`	`bool`	Arms short-circuit #1 (`ladder-single-rung`) even on >= 2160p sources.
`saliency_intensity`	`str`	Passed through to the saliency stage when not skipped. One of `default`, `aggressive`, `very_aggressive`.
`target_vmaf_offset`	`float`	Additive offset applied to the predictor's effective target VMAF. The input `--target-vmaf` (the gate that ships models) is never shifted by this value — see the no-test-weakening note below.

The five recipe classes ship the following overrides. The values below are the F.5-calibrated thresholds emitted by ai/scripts/calibrate_phase_f_recipes.py and shipped in ai/data/phase_f_recipes_calibrated.json. The calibration was run on 2026-05-09 against the K150K corpus (.workingdir2/konvid-150k/konvid_150k.jsonl, 148 543 rows out of an expected 153 841 — the ingestion was ~96.6 % complete; a re-run on the full corpus is a follow-up PR). Threshold rationale and the per-class proxy-vs-corpus provenance break-down live in Research-0067 §"F.4 recipe-override placeholders" plus the JSON metadata block.

Class	`tight_interval_max_width`	`force_single_rung`	`saliency_intensity`	`target_vmaf_offset`	Source
`animation`	`1.75`	`true`	`aggressive`	`+2.0`	proxy (UGC-anchored)
`screen_content`	(unset)	(unset)	`very_aggressive`	`+1.0`	proxy (UGC-anchored)
`live_action_hdr`	`1.4`	(unset)	(default)	`0.0`	proxy (UGC-anchored)
`ugc`	`3.5`	`false`	`default`	`+1.5`	corpus (K150K)
`default`	(unset)	(unset)	(default)	`0.0`	n/a

K150K is a UGC-only corpus and carries no per-source content_class column; only the ugc row is corpus-derived. The other three rows are calibrated as documented absolute offsets ("proxy") anchored on the F.4 envelope until PR #477's TransNet shot-metadata columns plus a class-labelled subset land. The JSON recipes.<class>._provenance sub-dict records the source per row so future re-calibrations can distinguish corpus-derived from proxy-derived values. UGC's target_vmaf_offset came out empirically positive (+1.5) on K150K because the corpus's MOS distribution has a heavier upper tail than lower tail; the calibration script clamps every offset to the F.4 documented envelope of [-2.0, +2.0] so a pathological corpus cannot push the predictor target outside the regime the planner has been exercised against.

The auto.py runtime loads the JSON at module import via _load_calibrated_recipes; if the JSON file is missing or malformed, the F.4 placeholder constants in _F4_PLACEHOLDER_RECIPES apply as a graceful fallback. The generated JSON includes ADR-0661 run_provenance with the calibration script, argv, source corpus JSONL, row cap, and recipe output target. To regenerate the calibration after the corpus ingestion completes (or when a class-labelled corpus replaces K150K), run:

python ai/scripts/calibrate_phase_f_recipes.py \
    --corpus .workingdir2/konvid-150k/konvid_150k.jsonl \
    --out ai/data/phase_f_recipes_calibrated.json

Recipe rationale (each cited threshold is provisional pending F.5 calibration):

Animation — predictor residuals are tighter on flat colour fields, single-rung ladder is sufficient, and saliency is more aggressive on cel-line edges. Animation is intrinsically more compressible at a given perceptual quality, so the predictor aims ~2 VMAF higher.
Screen content — split-frame structure (low-entropy background + high-detail text/icon regions) benefits from very_aggressive saliency that raises QP on the background while keeping text near-lossless. Predictor target nudged +1.
Live-action HDR — per ADR-0300 the HDR pipeline already runs; the F.3 conformal-tight gate is narrowed to 1.4 because a wide predictor interval on HDR is more suspect than on SDR (the predictor was largely trained on SDR per ADR-0393).
UGC — user-generated content carries higher upstream-encode noise, inconsistent grading, and resolution mismatches; predictor uncertainty is the baseline. Widening the F.3 tight gate to 3.5 avoids over-flagging UGC cells as "needs escalation" simply because the interval is wider than a Netflix-grade reference. The K150K calibration nudges the predictor target up 1.5 because the corpus MOS distribution has a heavier upper tail.

The recipe class is recorded in plan.metadata.recipe_applied (one of animation, screen_content, live_action_hdr, ugc, or default) and the override dict in plan.metadata.recipe_overrides. Each cell in plan.cells[] also carries the resolved saliency_intensity and effective_predictor_target_vmaf so JSON consumers don't need to cross-reference the metadata block.

Per CLAUDE.md memory feedback_no_test_weakening: recipe overrides MUST NOT silently widen the production-flip gate that ships models. They affect predictor thresholds (the effective_predictor_target_vmaf that the predictor aims for; the F.3 width gate that decides per-cell escalation), not the input --target-vmaf that downstream consumers treat as the contract. The driver records the input target_vmaf verbatim in plan.metadata.target_vmaf and the offset target in plan.metadata.effective_predictor_target_vmaf; the two are kept distinct.

The helper _apply_recipe_override(meta, plan_state, thresholds) in tools/vmaf-tune/src/vmaftune/auto.py resolves the recipe and returns a (recipe_class, recipe, effective_thresholds) triple; the get_recipe_for_class(content_class) helper returns a fresh override dict for any of the five canonical class strings. Both are pure functions; the table at module scope (_CONTENT_RECIPE_TABLE) holds factory callables so each call returns a fresh dict that callers may mutate without affecting subsequent runs.

References: ADR-0397 §F.4, ADR-0393, ADR-0300, Research-0067.

Tests¶

pytest tools/vmaf-tune/tests/

The shipped suite mocks subprocess.run so it neither requires ffmpeg nor a built vmaf. Real-binary integration coverage will land when the codec adapter set widens.

vmaf-tune — quality-aware encode automation harness¶

Subcommands at a glance¶

Codec adapters¶

Pipeline¶

JSON artifact portability¶

Environment variables¶

Workdir disk-space management (ADR-0577)¶

Install¶

Predictor Training Corpora¶

Local Sidecar Bias Correction¶

Quick start¶

SVT-AV1 example (ADR-0278)¶

Codec adapter parameter ranges¶

SVT-AV1 preset name -> integer mapping¶

CLI flags¶

Resolution-aware mode¶

GPU scoring backend¶

Modes¶

Detection heuristics¶

Wall-clock expectation (60 s 1080p source, indicative)¶

Examples¶

Vulkan score backend (--score-backend=vulkan)¶

Supported platforms¶

Verifying Vulkan availability¶

Example¶

Sample-clip mode¶

benchmark subcommand — corpus-level codec ranking¶

Corpus JSONL schema¶

Example row¶

recommend subcommand — target-VMAF / target-bitrate¶

Use a pre-built corpus¶

Build the corpus on the fly¶

Output¶

recommend flags¶

fast subcommand — proxy + Bayesian + GPU-verify (Phase A.5)¶

Install¶

Smoke run (no ffmpeg / no ONNX / no GPU)¶

Production run¶

Exit codes¶

Fall-back idiom¶

fast flags¶

prefilter subcommand — Pelorus deband + CRF joint autotune (ADR-1116)¶

The frozen knob contract¶

How the joint search works¶

Quick start (smoke)¶

Live loop (requires a Pelorus-enabled ffmpeg)¶

prefilter flags¶

Codec adapters¶

libaom-av1¶

libaom vs SVT-AV1 trade-offs¶

Hardware encoders (NVENC)¶

Hardware encoders (AMF)¶

Coarse-to-fine CRF search (ADR-0306)¶

corpus --coarse-to-fine¶

Hardware vs software trade-off¶

Hardware encoders (QSV)¶

Apple VideoToolbox adapters¶

ProRes tier reference¶

Saliency-aware encoding (recommend-saliency --saliency-aware)¶

Synopsis¶

How it works¶

Trade-off¶

Temporal aggregation¶

Graceful fallback¶

Encoder targets¶

Caveats¶

Reproducer¶

Codec adapter contract¶

Adapters in flight¶

Codec comparison¶

Multi-target rate-quality sweep (schema v2, ADR-0516 + ADR-0534 + ADR-0538)¶

High-VMAF bisect contract (ADR-0538)¶

compare CLI flags¶

compare output schema¶

CRF sweep mode (--no-bisect)¶

HDR-aware tuning (Bucket #9, ADR-0300)¶

What gets detected¶

Detection modes¶

Codec dispatch¶

HDR VMAF scoring (model-port slot)¶

`vmaf-tune` — quality-aware encode automation harness¶

Vulkan score backend (`--score-backend=vulkan`)¶

`benchmark` subcommand — corpus-level codec ranking¶

`recommend` subcommand — target-VMAF / target-bitrate¶

`recommend` flags¶

`fast` subcommand — proxy + Bayesian + GPU-verify (Phase A.5)¶

`fast` flags¶

`prefilter` subcommand — Pelorus deband + CRF joint autotune (ADR-1116)¶

`prefilter` flags¶

`libaom-av1`¶

`corpus --coarse-to-fine`¶

Saliency-aware encoding (`recommend-saliency --saliency-aware`)¶

`compare` CLI flags¶

`compare` output schema¶

CRF sweep mode (`--no-bisect`)¶

`recommend` — pick a CRF for a quality target¶

`report` subcommand — stdout aggregate flags¶

`encode-profile` subcommand — reuse report recommendations¶

Phase A.5 — opt-in `fast` recommend¶