ADR-0319: fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training¶
- Status: Accepted
- Date: 2026-05-06
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ai, fr-regressor, ensemble, loso, fork-local
- Related: ADR-0303 (gate definition + LOSO trainer scaffold), ADR-0309 (real-corpus retrain harness), ADR-0291 (deterministic v2 prod flip — defines the 0.95 LOSO PLCC ship gate), Research-0075 (LOSO protocol + JSON schema)
Context¶
ADR-0303 (PR #399) merged the ensemble production-flip scaffold — ai/scripts/train_fr_regressor_v2_ensemble_loso.py — but intentionally left two functions stubbed because the Phase A canonical-6 corpus did not exist on master at the time:
_load_corpus(corpus_path)raisedNotImplementedError._train_one_seed(seed, corpus, args)raisedNotImplementedError.
ADR-0309 (PR #405) merged the real-corpus retrain harness — the wrapper script ai/scripts/run_ensemble_v2_real_corpus_loso.sh and the validator ai/scripts/validate_ensemble_seeds.py. The wrapper forwarded --corpus-root / --output argv that did not match the trainer's --corpus / --out-dir interface, but the mismatch was masked by the trainer's smoke-mode no-op exit when the corpus was absent.
The Phase A canonical-6 corpus (runs/phase_a/full_grid/per_frame_canonical6.jsonl, ~5,640 NVENC per-frame rows × 9 Netflix sources × 4 CQs) is now generated locally via scripts/dev/hw_encoder_corpus.py. This ADR closes both deferrals and the wrapper-mismatch in one PR.
Decision¶
Implement the real loader + per-fold trainer body, fix the wrapper argv, and ship the runbook update — all in the same PR. No registry flip; that stays a separate follow-up per ADR-0309's invariant.
Concretely:
_load_corpusreads the JSONL viapandas.read_json(..., lines=True), validates the canonical-6 columns +vmaf/src/encoder/cq/frame_index, fits a corpus-wide StandardScaler (ADR-0291 recipe), and pre-computes the 14-D codec block: 12-slot ENCODER_VOCAB v2 one-hot +preset_norm(constant 0.5 —hw_encoder_corpus.pydoes not record preset) +crf_norm(cq normalised over the corpus's observed cq range)._train_one_seedruns 9-fold LOSO over the uniquesrcvalues: per fold, fit a fold-local StandardScaler on the training rows only (mirrorseval_loso_vmaf_tiny_v3.py), train anFRRegressor(in_features=6, num_codecs=14)forargs.epochs(default 200) with Adam(lr=5e-4,weight_decay=1e-5), evaluate PLCC / SROCC / RMSE on the held-out source. The returned dict carriesmean_plcc(the fieldscripts/ci/ensemble_prod_gate.pyparses) plus the per-fold list,min_plcc,max_plcc,std_plcc,wall_time_s, and full hyperparameters — matches Research-0075 §JSON schema.- The wrapper script passes
--corpus "$CORPUS_JSONL"(default the canonical Phase A path) and--out-dir "$out_dir"; drops the obsolete--corpus-root/--outputargv. Adds a prereq check that the JSONL exists and has ≥100 rows. - The runbook gains a "Step 0: Generate Phase A canonical-6 corpus" section before the verification step, documenting the
hw_encoder_corpus.pyinvocation pattern and the QSV-optional note. ai/AGENTS.md's "Ensemble registry invariant" section gains a note pinning the canonical-6 schema as load-bearing — schema changes require an ENCODER_VOCAB version bump.
The num_codecs=14 choice (full codec block width, including preset_norm + crf_norm) matches train_fr_regressor_v2.py's FRRegressor(num_codecs=num_codec_dims) pattern. The fr_regressor_v2_ensemble_v1 smoke ONNX shipped under model/tiny/ was trained with num_codecs=NUM_CODECS=6 against vmaf_train.codec.CODEC_VOCAB (the v1 codec vocab); the registry flip PR will need to retrain or accept the architecture mismatch. The trainer here optimises for the v2 deterministic baseline that already cleared 0.9681 mean LOSO PLCC (ADR-0291).
Alternatives considered¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| pandas + torch (chosen) | Mirrors eval_loso_vmaf_tiny_v3.py loader pattern; FRRegressor reuse keeps the v2 ONNX-export path open; pandas is already a training-time dep | Adds ~30 MB pandas footprint to the trainer venv (already required by other LOSO scripts) | Selected — minimum-deviation from existing v2 + LOSO patterns |
| Pure numpy / json loader | No pandas dep; smaller install footprint | Re-implements the source-grouping + column projection that pandas.groupby does in two lines; loses parity with eval_loso_vmaf_tiny_v3.py reproducer | Rejected — trainer is dev-time, not runtime; pandas is already on the path for the other LOSO scripts |
| PyTorch Lightning Trainer | Matches FRRegressor Lightning class; gets logging / checkpointing for free | 200-epoch loops over 9 folds × 5 seeds inside Lightning add per-fold setup cost; not necessary for a fixed-budget LOSO pass | Rejected — manual Adam + MSELoss loop is ~50 LOC, Lightning's overhead would dominate on this corpus size |
Consequences¶
- Visible behaviour change:
train_fr_regressor_v2_ensemble_loso.pynow produces realloso_seed{N}.jsonartefacts when given a corpus. The CI gatescripts/ci/ensemble_prod_gate.pycan apply its two-part check (mean PLCC ≥ 0.95ANDspread ≤ 0.005) against real numbers. - Wrapper now end-to-end:
bash ai/scripts/run_ensemble_v2_real_corpus_loso.shruns without argv errors against the canonical Phase A corpus. - Registry untouched: per ADR-0309's invariant, registry-flip stays a separate PR. This ADR ships the trainer + wrapper + runbook only.
- Wall-time: ~5 min per seed on RTX 4090 (verified end-to-end); ~25 min for the full 5-seed × 9-fold run. Slower CPUs scale linearly.
- Preset-conditioning is a no-op for now: the canonical-6 corpus doesn't record preset, so
preset_normis the constant 0.5. A future corpus regen with explicit preset metadata would activate the column without retraining (the constant-column model just ignores it).
References¶
- req (2026-05-06, operator): "Operator on the fork's primary GPU host has already generated the canonical-6 corpus locally; this PR plugs in the real implementations + fixes the wrapper." (paraphrased from the dispatcher prompt requesting closure of the ADR-0303 / ADR-0309 deferrals.)
- ADR-0303 — gate definition (mean ≥ 0.95 AND spread ≤ 0.005).
- ADR-0309 — retrain harness; this PR closes the deferred trainer-body + wrapper argv-mismatch tracked there.
- ADR-0291 — deterministic v2 prod flip; source of the 0.95 LOSO PLCC ship gate that the ensemble inherits.
- Research-0075 §JSON schema — emitted-JSON contract.
scripts/ci/ensemble_prod_gate.py— single source of truth for the threshold constants.ai/scripts/eval_loso_vmaf_tiny_v3.py— pandas-based LOSO loader pattern this trainer mirrors.