ADR-0324: Ensemble training kit — portable Phase-A + LOSO retrain bundle¶
- Status: Accepted
- Date: 2026-05-06
- Deciders: Lusoris, Claude
- Tags: ai, fr-regressor, ensemble, tooling, fork-local
Context¶
Lawrence (collaborator with different hardware than the lead user) has volunteered to run a fr_regressor_v2_ensemble_v1 retrain against his own copy of the Netflix Public Dataset to provide a second PROMOTE/HOLD verdict — independent confirmation of the gate result the lead user emitted in PR #422 (real-corpus LOSO mean PLCC 0.997, spread 0.001) and PR #424 (full-corpus ONNX export + production flip).
The plumbing already exists in tree:
scripts/dev/hw_encoder_corpus.py— the Phase-A producer.ai/scripts/run_ensemble_v2_real_corpus_loso.sh— the LOSO wrapper.ai/scripts/train_fr_regressor_v2_ensemble_loso.py— the trainer.ai/scripts/validate_ensemble_seeds.py— the gate verdict emitter.ai/scripts/export_ensemble_v2_seeds.py— the per-seed ONNX exporter (full-corpus fit; LOSO trainer discards models after computing PLCC).docs/ai/ensemble-v2-real-corpus-retrain-runbook.md— the operator doc.
But the runbook still requires the operator to chain ten files and five env-var conventions by hand. The same set of scripts lives at five different paths under three different roots (scripts/dev/, ai/scripts/, scripts/ci/). Asking a collaborator to clone the whole vmaf fork (including .workingdir2/netflix/*.yuv gitignored — they don't exist on his disk) and read the runbook front- to-back to figure out invocation order is hostile to the goal of "send him a tarball, he runs one command, he sends back a verdict".
Decision¶
We will ship a self-contained kit at tools/ensemble-training-kit/ with:
- A one-command orchestrator (
run-full-pipeline.sh) that chains prereqs → corpus generation → LOSO training → verdict → bundling with sane error handling and--ref-dir/--encoders/--cqs/--out-dirflags. - Five numbered step scripts (
01-prereqs.sh…05-bundle-results.sh) the operator can invoke individually for retries. - An operator-facing
README.mdwith the prereq table, troubleshooting matrix, and the verdict-file shape. - A
make-distribution-tarball.shthat bundles the kit + every in-repo script it invokes + the runbook into a portable tar.gz the collaborator can untar without cloning the fork. - Pinned Python deps (
pyproject.toml+requirements-frozen.txt).
The bundling on PROMOTE re-uses the existing ai/scripts/export_ensemble_v2_seeds.py (which already fits the final-production weights on the FULL corpus and writes ONNX + sidecar artefacts) — no changes to _train_one_seed are required.
Status update 2026-05-06: kit extended for multi-platform¶
The original kit assumed an NVIDIA-only Linux box (lead user's hardware). Lawrence's collaborator workstation has four boxes that need to participate: an NVIDIA CUDA card, an Intel Arc 310, an Intel iGPU, and a Mac. The kit now supports four host families:
linux-x86_64-cuda—h264_nvenc,hevc_nvenc,av1_nvenc(existing)linux-x86_64-sycl—h264_qsv,hevc_qsv,av1_qsv(Intel Arc / iGPU)linux-x86_64-vulkan—libx264CPU baseline (no GPU detected)darwin-{arm64,x86_64}-cpu—h264_videotoolbox,hevc_videotoolbox
A new _platform_detect.sh helper auto-defaults the encoder list per box; 01-prereqs.sh skips NVIDIA-specific gates on non-CUDA platforms and probes ffmpeg-VideoToolbox availability on Darwin instead. A new build-libvmaf-binaries.sh script lets each operator build a libvmaf binary for their box once and rsync it into binaries/<platform>/; binaries are not bundled in source control. The scripts/dev/hw_encoder_corpus.py encoder vocabulary now covers {h264,hevc}_videotoolbox with the canonical adapter's argv shape (-c:v <name> -q:v <quality> -realtime 0) verified against tools/vmaf-tune/src/vmaftune/codec_adapters/_videotoolbox_common.py.
Each box generates its own canonical-6 corpus shard; the lead user merges shards via ai/scripts/merge_corpora.py (de-duplicates by src_sha256) before the cross-platform LOSO retrain.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
In-tree kit at tools/ensemble-training-kit/ (chosen) | Single place to look; ships in PRs alongside the trainer; lints / changelog / ADR entry land together; tarball generator produces a self-contained bundle for collaborators with no checkout. | Adds ~500 LOC of bash + markdown to the tree. | — |
Git submodule pointing at a separate VMAFx/vmafx-ensemble-kit repo | Cleanly separates the operator-facing surface from the engine. | Submodule overhead (extra clone step, separate CI, version-pin drift). The kit's scripts must stay in lockstep with ai/scripts/ — a separate repo guarantees drift when the trainer's argv changes (already happened twice: PR #422 dropped --corpus-root, PR #424 added --out-dir). | Drift risk dwarfs the cleanliness benefit. |
| Docker image bundling the toolchain | One-step setup; reproducible Python + CUDA env. | NVENC inside a container needs --device /dev/nvidia* + matching driver versions; libvmaf-CUDA build inside the image costs 10+ minutes; image is multi-GiB to ship. The CLAUDE.md constraint explicitly states "Docker is NOT a requirement". | Higher friction than a tarball + prereq check. |
Consequences¶
- Positive: collaborators can run the retrain in one command after unpacking a < 50 MiB tarball. The kit lives next to the trainer, so CLI drift between LOSO wrapper / exporter / kit fails CI within a single PR rather than silently. The README's troubleshooting matrix doubles as an FAQ for future operators.
- Negative: ~500 LOC of bash + markdown to maintain. If the LOSO wrapper changes argv conventions, both the wrapper and the kit's pass-through must update.
- Neutral / follow-ups: The kit assumes seeds 0..4 because the LOSO wrapper hard-codes them; if a future PR parameterises the wrapper's seed list, the kit's
--seedsflag (currently advisory) becomes authoritative.
References¶
docs/ai/ensemble-v2-real-corpus-retrain-runbook.md— the operator doc this kit extends.- ADR-0303 — the gate this kit's verdict step applies.
- ADR-0309 — the real-corpus retrain workflow this kit packages.
- ADR-0319 — LOSO trainer real impl + the wrapper's argv contract.
- ADR-0321 — the full-corpus ONNX export script the bundle step invokes on PROMOTE.
- Source:
req— direct user request to ship a portable kit so a collaborator on different hardware can run the retrain end-to-end.
Status update 2026-05-08: Accepted¶
Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).
Acceptance criteria verified in tree at HEAD 0a8b539e:
tools/ensemble-training-kit/— present, withrun-full-pipeline.sh, the five numbered step scripts (01-prereqs.shthrough05-bundle-results.sh),make-distribution-tarball.sh,_platform_detect.sh,build-libvmaf-binaries.sh,extract-corpus.sh, and abinaries/per-platform staging directory.- The multi-platform extension (Linux CUDA / Linux SYCL / Linux Vulkan-baseline / Darwin CPU) was already documented in the original ADR's
### Status update 2026-05-06block; nothing to re-verify here beyond the four families remaining live. - Verification command:
ls tools/ensemble-training-kit/.