Research-0647: fr_regressor_v1 refresh on the 2026-05-20 Netflix table¶
Question¶
Does the refreshed Netflix full-feature table still clear the ADR-0249 C1 ship gate, and how much do the shipped fr_regressor_v1 metrics move?
Inputs¶
- Feature table:
runs/full_features_netflix_refresh_20260520.parquet - Table shape:
11190rows,30columns - Trainer:
PYTHONPATH=ai/src .venv/bin/python ai/scripts/train_fr_regressor.py - Model recipe: unchanged ADR-0249
FRRegressor(hidden=64,depth=2,dropout=0.1, canonical-6 input) - Ship gate: mean LOSO PLCC >=
0.95
Result¶
| Source | PLCC | SROCC | RMSE |
|---|---|---|---|
| BigBuckBunny | 0.9962 | 0.6277 | 4.466 |
| BirdsInCage | 0.9993 | 0.9997 | 1.854 |
| CrowdRun | 0.9997 | 0.9998 | 1.005 |
| ElFuente1 | 0.9992 | 0.9963 | 1.635 |
| ElFuente2 | 0.9964 | 0.9969 | 3.169 |
| FoxBird | 0.9967 | 0.9954 | 2.352 |
| OldTownCross | 0.9992 | 0.9998 | 1.931 |
| Seeking | 0.9989 | 0.9962 | 1.982 |
| Tennis | 0.9982 | 0.9982 | 1.356 |
Summary:
- Mean PLCC:
0.9982 ± 0.0014 - Mean SROCC:
0.9567 ± 0.1234 - Mean RMSE:
2.194 ± 1.049 - Final all-source in-sample PLCC:
0.9993 - Exported ONNX sha256:
b57dee2509290d77c7980f8f23aa1380f64937c485d1b1d1e5f78c13a3a54c63
Interpretation¶
The refreshed model clears the unchanged PLCC ship gate by a wide margin and improves PLCC variance relative to the previous sidecar. The low BigBuckBunny SROCC is a real caveat: rank ordering within that fold is weaker even though linear agreement remains high. This does not block the C1 checkpoint because ADR-0249 defines PLCC-vs-teacher as the ship gate, but it should be checked again when the aggregate 4/5-corpus models are refreshed.
Reproducer¶
PYTHONPATH=ai/src .venv/bin/python ai/scripts/train_fr_regressor.py \
--parquet runs/full_features_netflix_refresh_20260520.parquet \
--metrics-out runs/fr_regressor_v1_refresh_20260520_metrics.json
PYTHONPATH=ai/src .venv/bin/python - <<'PY'
import onnx
onnx.checker.check_model(onnx.load("model/tiny/fr_regressor_v1.onnx"))
PY
bash core/test/dnn/test_registry.sh