Training Discovery Synthesis — 2026-05-14¶
Scope¶
This note answers the operator question: "we already have trained a lot, I wonder if we already can make discoveries by what we learned so far?"
The answer is yes, but only for claims backed by committed model sidecars or model cards. This synthesis intentionally excludes gitignored local run directories and uncommitted corpora so the evidence can be reproduced from a clean checkout.
Reproducer:
Actionable findings¶
1. Canonical-6 FR prediction is saturated for the current corpus¶
fr_regressor_v3 uses the canonical-6 libvmaf feature block plus an 18-D codec block and clears the LOSO gate by a wide margin:
| Model | Rows | PLCC | SROCC | RMSE | Evidence |
|---|---|---|---|---|---|
| fr_regressor_v2 | 216 | 0.9794 | 0.9640 | 3.0143 | in-sample |
| fr_regressor_v3 | 5640 | 0.9975 | 0.9691 | 1.0883 | LOSO |
| fr_regressor_v2_ensemble_v1 | - | 0.9973 | - | - | LOSO ensemble, spread=0.000951 |
Action: stop spending effort on deeper MLPs over exactly the same canonical-6 feature space. The v3 / v4 history already shows the next gains need a regime change: richer feature columns, more diverse corpora, or uncertainty/ensemble use, not a larger fully-connected network over the same six inputs.
What to do next:
- Keep
fr_regressor_v3as the strong baseline for future retrain comparisons. - Prioritise the
v3plus/richer-feature path and corpus expansion over architecture-only experiments. - Use the weaker folds from the v3 sidecar (
FoxBird,ElFuente2,Tennis) as the first content set for residual analysis.
2. QSV is easier to predict than NVENC in the current hardware corpus¶
The real hardware predictor cards show QSV ahead of NVENC for every shared codec family in PLCC and RMSE. The AV1 gap is large enough to be operationally interesting rather than measurement noise.
| Codec family | NVENC PLCC | QSV PLCC | Delta | NVENC RMSE | QSV RMSE |
|---|---|---|---|---|---|
| h264 | 0.7908 | 0.7945 | +0.0037 | 13.7288 | 12.9497 |
| hevc | 0.7439 | 0.8302 | +0.0863 | 12.0813 | 9.7754 |
| av1 | 0.6561 | 0.8777 | +0.2216 | 12.4922 | 8.5336 |
Action: treat NVENC predictor quality as the next hardware-model debugging target. The current 14-D predictor feature vector is not capturing enough of NVENC's rate-control behaviour, especially for AV1.
What to do next:
- Add per-card/per-driver/device metadata to the training sidecar audit, but keep it out of static hardware-capability priors until it comes from measured corpus rows.
- Run permutation/residual analysis on the real NVENC rows first, with slices by codec, source, resolution, CQ, and bitrate bucket.
- Test whether adding first-pass encode statistics or GOP-shape features closes the AV1 NVENC residual before collecting a larger corpus.
3. Resize+Conv is a real saliency-student improvement¶
The saliency-student v2 ablation changed only the decoder upsampler shape and improved validation IoU over v1:
| Model | Best val IoU | Params | Decoder |
|---|---|---|---|
| saliency_student_v1 | 0.6558 | 112841 | ConvTranspose decoder |
| saliency_student_v2 | 0.7105 | 123721 | Resize+Conv decoder |
Action: v2 is good enough to justify an ROI encode validation pass, but not a production flip by itself. The next gate is whether the better saliency map improves bitrate allocation in real encodes.
What to do next:
- Run matched ROI encodes using v1 vs v2 on the existing saliency-aware
vmaf-tunesurfaces. - Compare bitrate at fixed VMAF, saliency-weighted VMAF, and visible artifacts in high-saliency regions.
- Promote v2 only if encode-level validation agrees with the DUTS IoU improvement.
4. CHUG is the immediate HDR subjective-corpus target¶
CHUG ("Crowdsourced User-Generated HDR Video Quality Dataset") is now the highest-leverage HDR corpus to add before drawing HDR-specific training conclusions. The repository describes 5,992 UGC-HDR videos from 856 HDR references, 211,848 AMT ratings, bitrate-ladder encodes, portrait/landscape coverage, and a CSV manifest with Video, mos_j, sos_j, ref, bitladder, resolution, bitrate, orientation, framerate, height, and width columns.
Action: use the CHUG manifest adapter before further HDR discovery claims. It is a metadata/manifest loader first, with downloads kept in .workingdir2 and no video redistribution.
What to do next:
- Run
ai/scripts/chug_to_corpus_jsonl.py: parsechug.csv, expose video IDs, MOS/SOS, reference flag, resolution, bitrate ladder label, orientation, FPS, height, and width. - Use the script's
--max-rowssmoke path to validate CSV parsing and S3 URL construction without downloading the whole dataset. - Materialise FR feature rows by pairing each distorted ladder row with its
chug_content_namereference row, scaling the distorted side to the reference geometry before libvmaf extraction. This is recorded in ADR-0427. - Gate all committed CHUG-derived weights as non-commercial research artefacts unless the license ambiguity below is resolved in a more permissive direction.
Blockers for the remaining claims¶
Synthetic predictor cards are not evidence¶
The AMF, libx264, libx265, libaom-av1, libsvtav1, and libvvenc predictor cards are still synthetic-stub cards. Their metrics are expected to look high because the regression target is the analytical fallback, not a held-out measured corpus.
Blocker: real corpora do not exist yet for these adapters in committed artefacts. Until they do, these cards can validate the load path only.
MOS-head discoveries need committed gate metrics¶
konvid_mos_head_v1 is structurally present and the invariants are documented, but the sidecar does not expose the same compact metric block that the FR and saliency sidecars expose.
Blocker: the MOS-head model card / sidecar needs a committed summary of PLCC, SROCC, RMSE, spread, corpus split, and gate verdict before we can cite it in discovery claims.
We do not yet know whether NVENC needs features or just rows¶
The real hardware cards identify NVENC as the weak family, but they do not explain the cause. The plausible causes are separable:
- corpus imbalance across content / CQ / resolution;
- insufficient probe features for NVENC rate control;
- device / driver behaviour hidden behind a single encoder label;
- train/test split leakage or mismatch from the seeded 80/20 split.
Blocker: residual analysis over the underlying real rows is needed before changing the predictor architecture.
HDR conclusions are blocked on the external model¶
The current FR and predictor evidence is SDR / existing-model evidence. Netflix's future HDR model can change score distributions and the feature-response profile. CHUG closes part of the data gap for subjective UGC-HDR/MOS learning, but it does not replace a committed HDR-FR teacher model.
Blockers:
- no HDR-FR teacher model artefact is in-tree yet;
- no CHUG feature-extraction pass has completed yet;
- CHUG's README badge says CC BY-NC 4.0, while
license.txtcontains Creative Commons Attribution-NonCommercial-ShareAlike 4.0 text. Treat the stricter non-commercial/share-alike terms as the working license until clarified; - CHUG videos are externally hosted on S3 and must remain out of git.
HDR-specific discoveries should stay out of the production report until the model and/or CHUG adapter lands and a fresh corpus pass runs.
Generated sidecar report¶
The following table is generated from committed sidecars/cards by scripts/dev/training_discovery_report.py.
```text
Training Discovery Report¶
¶
Generated from committed model sidecars and model cards.¶
¶
Tiny FR Regressors¶
¶
| Model | Rows | PLCC | SROCC | RMSE | Evidence |¶
| --- | --- | --- | --- | --- | --- |¶
| fr_regressor_v2 | 216 | 0.9794 | 0.9640 | 3.0143 | in-sample |¶
| fr_regressor_v3 | 5640 | 0.9975 | 0.9691 | 1.0883 | LOSO |¶
| fr_regressor_v2_ensemble_v1 | - | 0.9973 | - | - | LOSO ensemble, spread=0.000951 |¶
¶
Saliency Students¶
¶
| Model | Best val IoU | Params | Decoder |¶
| --- | --- | --- | --- |¶
| saliency_student_v1 | 0.6558 | 112841 | ConvTranspose decoder |¶
| saliency_student_v2 | 0.7105 | 123721 | F.interpolate(scale_factor=2.0, mode='bilinear', align_corners=False) + nn.Conv2d(kernel=3, padding=1, no bias) |¶
¶
Real Hardware Predictor Cards¶
¶
| Codec | Corpus | PLCC | SROCC | RMSE | Card |¶
| --- | --- | --- | --- | --- | --- |¶
| av1_nvenc | real-N=2592 | 0.6561 | 0.6154 | 12.4922 | model/predictor_av1_nvenc_card.md |¶
| h264_nvenc | real-N=2592 | 0.7908 | 0.7837 | 13.7288 | model/predictor_h264_nvenc_card.md |¶
| hevc_nvenc | real-N=2592 | 0.7439 | 0.7374 | 12.0813 | model/predictor_hevc_nvenc_card.md |¶
| av1_qsv | real-N=1620 | 0.8777 | 0.8424 | 8.5336 | model/predictor_av1_qsv_card.md |¶
| h264_qsv | real-N=1620 | 0.7945 | 0.8555 | 12.9497 | model/predictor_h264_qsv_card.md |¶
| hevc_qsv | real-N=1620 | 0.8302 | 0.8322 | 9.7754 | model/predictor_hevc_qsv_card.md |¶
¶
QSV vs NVENC Predictor Delta¶
¶
| Codec family | NVENC PLCC | QSV PLCC | Delta | NVENC RMSE | QSV RMSE |¶
| --- | --- | --- | --- | --- | --- |¶
| h264 | 0.7908 | 0.7945 | +0.0037 | 13.7288 | 12.9497 |¶
| hevc | 0.7439 | 0.8302 | +0.0863 | 12.0813 | 9.7754 |¶
| av1 | 0.6561 | 0.8777 | +0.2216 | 12.4922 | 8.5336 |¶
```¶
External corpus reference — CHUG¶
- Repository: https://github.com/shreshthsaini/CHUG
- Paper DOI: https://doi.org/10.1109/ICIP55913.2025.11084488
- CSV manifest:
https://raw.githubusercontent.com/shreshthsaini/CHUG/master/chug.csv - Video ID list:
https://raw.githubusercontent.com/shreshthsaini/CHUG/master/chug-video.txt - Video URL pattern:
https://ugchdrmturk.s3.us-east-2.amazonaws.com/videos/VIDEO.mp4