CHUG HDR MOS head — held-out test validator¶

ai/scripts/validate_chug_hdr_mos_head.py evaluates a trained CHUG HDR MOS head ONNX against the 552-row held-out test partition of the CHUG corpus. This partition was never used during training or model selection.

Purpose¶

The CHUG corpus has explicit train / val / test splits in every feature JSONL row. The trainer and seed-sweep use only train and val. The test partition is kept strictly held-out so that the final production gate can be evaluated without optimistic bias from hyperparameter tuning.

Held-out production gate¶

Metric	Threshold
PLCC	≥ 0.85
SROCC	≥ 0.82
RMSE	≤ 0.45 MOS units

All three must pass for exit 0. Exit 2 means the gate failed; exit 1 means an input or inference error (not a gate verdict). Thresholds mirror the ADR-0325 training gate and are never lowered on a miss.

Usage¶

python ai/scripts/validate_chug_hdr_mos_head.py \
    --onnx .workingdir2/training/models/chug_hdr_mos_head_v1_wide_seed20260521.onnx \
    --out-json  .workingdir2/training/validation/chug_held_out_test_YYYYMMDD.json \
    --out-md    .workingdir2/training/validation/chug_held_out_test_YYYYMMDD.md

Default ONNX path is .workingdir2/training/models/chug_hdr_mos_head_v1_wide_seed20260521.onnx. Override via VMAF_CHUG_HDR_ONNX environment variable.

Default shard directory is .corpus/chug/training/fr_canonical_shards/output/. Override via --shard-dir or pass explicit --feature-jsonl paths.

Arguments¶

Flag	Default	Description
`--onnx`	see above	CHUG MOS head ONNX path
`--shard-dir`	`.corpus/chug/…/output`	Dir searched for `shard_*.features.jsonl`
`--feature-jsonl`	(from shard-dir)	Explicit shard path; may be repeated
`--out-json`	`.workingdir2/training/validation/…json`	JSON report + run-manifest
`--out-md`	`.workingdir2/training/validation/…md`	Markdown summary
`--gate-plcc`	0.85	Override PLCC threshold (for testing)
`--gate-srocc`	0.82	Override SROCC threshold
`--gate-rmse`	0.45	Override RMSE threshold

Outputs¶

The validator emits three files:

JSON report (--out-json): full metrics, per-field gate verdict, first-5-row sample predictions/MOS, and an ai-run-provenance-v1 sidecar block recording the ONNX path, shard paths, argv, and git HEAD.
Markdown report (--out-md): human-readable table for PR descriptions.
Console summary: PLCC / SROCC / RMSE and PASS / FAIL verdict.

Current result (2026-05-27)¶

Model chug_hdr_mos_head_v1_wide_seed20260521.onnx:

Metric	Val	Test (held-out)	Gate
PLCC	0.8733	0.8468	FAIL (< 0.85)
SROCC	0.8528	0.8188	FAIL (< 0.82)
RMSE	0.2512	0.2639	OK

Model status: Proposed (not promoted). See docs/research/0715-chug-hdr-held-out-test-validation-2026-05-27.md for interpretation and next steps.

Feature schema¶

The validator uses the chug-hdr-wide-v1 34-column schema, identical to the training path in train_chug_hdr_mos_head.py. It reuses _row_to_features and _normalise_split from train_konvid_mos_head.py to ensure the feature projection is byte-identical to the training path.

Dependencies¶

onnxruntime (CPU provider)
numpy
aiutils.cli_helpers, aiutils.run_manifest (in-repo helpers)