ADR-0321: fr_regressor_v2_ensemble_v1 — full production flip (real ONNX + sidecars)¶
- Status: Accepted
- Date: 2026-05-06
- Deciders: Lusoris, Claude (Anthropic)
- Tags:
ai,tinyai,models,registry,prod-flip
Context¶
ADR-0303 defined the production-flip workflow for the deep-ensemble probabilistic head over fr_regressor_v2. ADR-0309 hardened the workflow against rebase-time foot-guns: PROMOTE.json is emitted by the LOSO validator, but the registry flip itself happens in a dedicated follow-up PR. ADR-0319 landed the real LOSO trainer + corpus loader.
The first flip attempt (PR #423) tried a pure metadata edit on model/tiny/registry.json — only flipping smoke: true → false on the five seed rows. It tripped core/test/dnn/test_registry.sh, which requires a sidecar JSON next to every non-smoke ONNX. Only the shared ensemble manifest fr_regressor_v2_ensemble_v1.json existed; no per-seed sidecars. PR #423 was closed for redo.
A second concern surfaced at the same time: the 3025-byte ONNX seed files committed in ADR-0303's scaffold PR were synthetic-corpus weights (1 epoch each), not the LOSO-validated production weights the gate had cleared. The registry flip without a re-export would ship stale weights that did not correspond to PROMOTE.json's mean_plcc=0.9973, spread=0.00095, per-seed >= 0.9968 numbers.
This ADR defines the proper production flip: re-train each seed on the FULL Phase A canonical-6 corpus (5,640 rows over 9 sources + h264_nvenc), export real ONNX weights, generate per-seed sidecars with full provenance, update the registry sha256s, and only then flip smoke: true → false.
Decision¶
We will produce the production checkpoints via a new driver ai/scripts/export_ensemble_v2_seeds.py that:
- Reuses
train_fr_regressor_v2_ensemble_loso._load_corpusso the codec block layout (12-slot ENCODER_VOCAB v2 one-hot +preset_norm+crf_norm, total 14 cols) is identical to what the LOSO gate validated. - Fits one
FRRegressor(in_features=6, hidden=64, depth=2, dropout=0.1, num_codecs=14)per seed on the full corpus (no held-out fold) — the LOSO PLCC was the gate for whether to ship at all; the production checkpoint should see every available row. - Exports each seed as
model/tiny/fr_regressor_v2_ensemble_v1_seed{N}.onnx(opset 17, two-input contract:features:[N,6]+codec_onehot:[N,14]→score:[N]). - Writes a per-seed sidecar
model/tiny/fr_regressor_v2_ensemble_v1_seed{N}.jsonmirroring the canonicalfr_regressor_v2.jsonshape (encoder vocab, codec_block_layout, feature_mean/std, training_recipe) plus ensemble-specific fields (seed,loso_mean_plcc,gate.this_seed_loso_plcc,corpus.sha256,parent_adrs: [ADR-0303, ADR-0309, ADR-0319, ADR-0321]). - Patches the five
fr_regressor_v2_ensemble_v1_seed{0..4}rows inmodel/tiny/registry.jsonwith the new sha256 andsmoke: false.
The shared manifest fr_regressor_v2_ensemble_v1.json is not modified by this ADR — it tracks the ensemble-mean entry point and is regenerated only via the trainer. The sidecars are fresh artefacts.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Full flip with real weights + sidecars (chosen) | Sidecars satisfy test_registry.sh; ONNX bytes match the gate-validated training recipe; sidecars carry per-seed provenance for future audits. | Requires ~2 min of full-corpus training per seed at PR-time; introduces a new driver script. | n/a — this is the chosen option. |
| Metadata-only flip (PR #423 approach) | One-line diff per row; no training compute. | Fails test_registry.sh (no sidecars); ONNX bytes are scaffold-era synthetic weights, not the LOSO-validated artefacts PROMOTE.json describes. | Rejected: PR #423 was closed precisely because of this. |
| Wait for the BVI-DVC corpus | Bigger, multi-codec corpus would yield stronger production weights. | BVI-DVC ingestion is not started; PROMOTE.json's gate already passed on the Phase A corpus by a wide margin (mean PLCC 0.997, threshold 0.95); blocking the flip on BVI-DVC indefinitely defeats the purpose of having the gate. | Rejected: the gate is what governs ship/no-ship, not corpus aspiration. |
Skip the per-seed sidecars; loosen test_registry.sh | Smaller diff. | Removes the per-non-smoke-ONNX sidecar contract that protects every other shipped tiny model. | Rejected: the test is the contract; weakening it is forbidden by the no-test-weakening rule. |
Consequences¶
- Positive: The five ensemble seeds now ship real LOSO-gated weights with full provenance. Inference-time consumers can now load these models in non-smoke mode and trust the per-seed score semantics. The sidecar shape is identical to
fr_regressor_v2.jsonso downstream loaders that already understand v2 sidecars Just Work. - Positive: The training recipe (full-corpus fit, fold-local scaler upstream of LOSO, fold-global scaler for the production fit) is now baked into each sidecar; future audits can reproduce exactly.
- Negative: ~10 min of training compute is added to the PR. The driver script must stay numerically deterministic across
torch/numpyversions — already covered by_set_seed_all. - Neutral / follow-ups: Future re-flips (corpus refresh, recipe change) must re-run
export_ensemble_v2_seeds.pyso ONNX bytes + sidecars regenerate together; flipping rows by hand is now forbidden by the AGENTS.md invariant added in this PR.
References¶
- Parent ADRs: ADR-0303, ADR-0309, ADR-0319.
- Closed PR: #423 (the metadata-only attempt; closed for redo).
- Gate verdict:
runs/ensemble_v2_real/PROMOTE.json(verdict=PROMOTE, mean_plcc=0.9973, spread=0.00095, per-seed >= 0.9968). - Source:
req— operator request to do "the proper production flip: train + export real ONNX weights per seed, generate per-seed sidecars, update registry sha256s, flip smoke=false."