ADR-0283: vmaf-tune Apple VideoToolbox codec adapters¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: lusoris (with Claude)
- Tags: tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local
Context¶
tools/vmaf-tune/ (the quality-aware encode automation harness from ADR-0237) has accumulated adapters for the NVIDIA NVENC, AMD AMF, and Intel QSV hardware families along the same contract. Apple VideoToolbox is the missing HW path on macOS — needed for the per-title CRF predictor's coverage on Apple Silicon hosts and for any future codec-aware regressor retrain that wants to learn the VT distortion signature.
Apple's VideoToolbox is the only hardware path on M-series and on Intel Macs with a T2 chip. FFmpeg exposes two encoders backed by it (h264_videotoolbox, hevc_videotoolbox); AV1 hardware encoding is not available on Apple Silicon as of 2026 and is intentionally omitted from this PR.
The harness's codec-adapter contract (tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py) is multi-codec from day one (per tools/vmaf-tune/AGENTS.md) — new codecs are one-file additions under codec_adapters/ and the search loop never branches on codec identity. This PR adds the two VideoToolbox adapters in line with that contract.
Decision¶
Add H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing a single _videotoolbox_common.py for the quality-knob and preset mapping. Both adapters carry invert_quality=False and a [0, 100] quality range — the harness's crf row slot now carries whatever native quality value each adapter declares, with downstream consumers interpreting the knob via the adapter registry.
The nine-name preset taxonomy maps onto VT's coarser -realtime flag: ultrafast/superfast/veryfast/faster/fast → -realtime 1; medium/slow/slower/veryslow → -realtime 0. The mapping is intentionally lossy — VT cannot expose a finer dial.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Two adapters sharing _videotoolbox_common.py (chosen) | Codec-specific files stay tiny (~30 LOC each); shared preset/quality logic deduplicated; matches the per-codec one-file-per-adapter convention from AGENTS.md. | One extra module file. | Best fit for the codec-adapter contract. |
Single videotoolbox.py with two classes | One file. | Mixes h264 + hevc encoder names in one module; breaks the "one file per codec" convention; harder to grep for. | Convention drift. |
Inline -realtime mapping inside each adapter | No shared file. | Duplicated preset → realtime logic between two adapters; preset list copy-pasted. | DRY violation. |
Map presets to a synthetic -prio_speed/-allow_sw matrix | More expressive than -realtime. | VideoToolbox does not expose those as encoder options uniformly across macOS versions; brittle. | Out-of-band knobs, not stable. |
| Skip VideoToolbox until a macOS CI runner exists | Avoid adding code that can't be exercised on Linux CI. | Blocks fr_regressor_v2_hw's hardware-codec coverage; macOS contributors can already exercise locally. | Test-mockable; subprocess boundary is the seam. |
Consequences¶
- Positive: a future codec-aware regressor retrain can include Apple-Silicon-encoded distortions in its corpus once the harness runs on a macOS host; the codec-adapter contract is exercised by one more live adapter family (proves the registry pattern). Tests mock
subprocess.runso the suite stays Linux-CI-runnable. - Negative: VT's
-realtimeis a coarse preset axis; the harness cannot drive a finer speed/quality dial on Apple hardware. The-q:vquality scale (0..100, higher = better) differs from CRF's scale (0..51, lower = better) so corpus consumers must readencoder+crftogether via the adapter registry — documented indocs/usage/vmaf-tune.md. - Neutral / follow-ups: a real macOS CI runner is out of scope here; the smoke gate runs on Linux against mocked subprocess. AV1 hardware encoding will land when
av1_videotoolboxships in FFmpeg + a supported macOS version.
References¶
- ADR-0237 —
vmaf-tuneumbrella spec. tools/vmaf-tune/AGENTS.md— codec-adapter contract invariants.docs/usage/vmaf-tune.md— user-facing CLI surface.- Source:
req(paraphrased) — user requested Apple VideoToolbox encoder adapters; the originally-coupled codec-vocab schema expansion is split into a separate follow-up PR awaiting a fresh production retrain (per ADR-0235 + ADR-0291 ship-gate).
Status update 2026-05-09: ProRes adapter added¶
The macOS hardware-encoder coverage trio is completed by adding prores_videotoolbox alongside the original h264_videotoolbox and hevc_videotoolbox adapters. The body of this ADR remains frozen per the "Immutable once Accepted" rule (docs/adr/README.md §Conventions); this appendix records the follow-on that landed in the same registry pattern without re-opening the original decision.
The ProRes adapter is deliberately a sibling of the H.264 / HEVC VT adapters — same registry shape, same _videotoolbox_common.py, same nine-name preset → -realtime mapping. The one shape difference is the quality knob:
| Adapter | Knob | Range | Direction |
|---|---|---|---|
h264_videotoolbox / hevc_videotoolbox | q:v | 0..100 | higher = better |
prores_videotoolbox | profile:v | 0..5 | higher tier = better |
ProRes is a fixed-rate intermediate codec — it has no CRF / QP / q:v style scalar. The harness's crf slot carries the integer ProRes tier id (0=proxy, 1=lt, 2=standard (422), 3=hq (422 HQ), 4=4444, 5=xq (4444 XQ)) per the FFmpeg prores_videotoolbox profile AVOption table (verified against libavcodec/videotoolboxenc.c prores_options in FFmpeg n8.1.1). The corpus row's existing encoder + crf pair carries the tier choice; downstream consumers translate the integer back via the adapter registry, so no schema change is needed.
Hardware availability: M1 Pro / Max / Ultra and every later M-series chip ship the ProRes hardware block; Intel Macs with a T2 chip do not have it (FFmpeg falls back to the software prores_aw / prores_ks encoders there).
ENCODER_VOCAB invariant: ProRes is not in the live ENCODER_VOCAB_V2 (12 slots, frozen by ADR-0291) nor in the ENCODER_VOCAB_V3 scaffold (16 slots, scaffold-only by ADR-0302). The prores_videotoolbox slot will land as part of a future v4 vocab bump tracked under T-FR-V2-VOCAB-V3-RETRAIN — until then, the proxy-fast-path raises ProxyError on ProRes input and the harness falls back to the live-encode loop, which works unchanged.
Reference: tools/vmaf-tune/src/vmaftune/codec_adapters/prores_videotoolbox.py.