CHUG HDR MOS head — held-out test validator¶
ai/scripts/validate_chug_hdr_mos_head.py evaluates a trained CHUG HDR MOS head ONNX against the 552-row held-out test partition of the CHUG corpus. This partition was never used during training or model selection.
Purpose¶
The CHUG corpus has explicit train / val / test splits in every feature JSONL row. The trainer and seed-sweep use only train and val. The test partition is kept strictly held-out so that the final production gate can be evaluated without optimistic bias from hyperparameter tuning.
Held-out production gate¶
| Metric | Threshold |
|---|---|
| PLCC | ≥ 0.85 |
| SROCC | ≥ 0.82 |
| RMSE | ≤ 0.45 MOS units |
All three must pass for exit 0. Exit 2 means the gate failed; exit 1 means an input or inference error (not a gate verdict). Thresholds mirror the ADR-0325 training gate and are never lowered on a miss.
Usage¶
python ai/scripts/validate_chug_hdr_mos_head.py \
--onnx .workingdir2/training/models/chug_hdr_mos_head_v1_wide_seed20260521.onnx \
--out-json .workingdir2/training/validation/chug_held_out_test_YYYYMMDD.json \
--out-md .workingdir2/training/validation/chug_held_out_test_YYYYMMDD.md
Default ONNX path is .workingdir2/training/models/chug_hdr_mos_head_v1_wide_seed20260521.onnx. Override via VMAF_CHUG_HDR_ONNX environment variable.
Default shard directory is .corpus/chug/training/fr_canonical_shards/output/. Override via --shard-dir or pass explicit --feature-jsonl paths.
Arguments¶
| Flag | Default | Description |
|---|---|---|
--onnx | see above | CHUG MOS head ONNX path |
--shard-dir | .corpus/chug/…/output | Dir searched for shard_*.features.jsonl |
--feature-jsonl | (from shard-dir) | Explicit shard path; may be repeated |
--out-json | .workingdir2/training/validation/…json | JSON report + run-manifest |
--out-md | .workingdir2/training/validation/…md | Markdown summary |
--gate-plcc | 0.85 | Override PLCC threshold (for testing) |
--gate-srocc | 0.82 | Override SROCC threshold |
--gate-rmse | 0.45 | Override RMSE threshold |
Outputs¶
The validator emits three files:
- JSON report (
--out-json): full metrics, per-field gate verdict, first-5-row sample predictions/MOS, and anai-run-provenance-v1sidecar block recording the ONNX path, shard paths, argv, and git HEAD. - Markdown report (
--out-md): human-readable table for PR descriptions. - Console summary: PLCC / SROCC / RMSE and PASS / FAIL verdict.
Current result (2026-05-27)¶
Model chug_hdr_mos_head_v1_wide_seed20260521.onnx:
| Metric | Val | Test (held-out) | Gate |
|---|---|---|---|
| PLCC | 0.8733 | 0.8468 | FAIL (< 0.85) |
| SROCC | 0.8528 | 0.8188 | FAIL (< 0.82) |
| RMSE | 0.2512 | 0.2639 | OK |
Model status: Proposed (not promoted). See docs/research/0715-chug-hdr-held-out-test-validation-2026-05-27.md for interpretation and next steps.
Feature schema¶
The validator uses the chug-hdr-wide-v1 34-column schema, identical to the training path in train_chug_hdr_mos_head.py. It reuses _row_to_features and _normalise_split from train_konvid_mos_head.py to ensure the feature projection is byte-identical to the training path.
Dependencies¶
onnxruntime(CPU provider)numpyaiutils.cli_helpers,aiutils.run_manifest(in-repo helpers)