ADR-0559: Feature Coverage Audit — Add speed_chroma + speed_temporal to Extraction Scripts (HDR-model prep)¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude Code agent
- Tags: ai, feature-extraction, speed, hdr, corpus, fork-local
Context¶
The cross-backend parity matrix (PR #1328 / ADR-0555) flagged speed_chroma and speed_temporal as CPU-only extractors with no GPU twin. Separately, the Netflix speed_ported upstream branch signals that these two extractors are the most likely inputs to a future Netflix HDR VMAF model. To be in a position to evaluate such a model against this fork's corpora — or to train a fork-owned HDR surrogate — the extraction scripts must include these features.
A systematic audit (see research digest docs/research/feature-coverage-audit-2026-05-18.md) found:
speed_chromaandspeed_temporalexist as CPU-only extractors incore/src/feature/speed.cand are registered infeature_extractor_list[]under#if VMAF_FLOAT_FEATURES.- Only
ai/scripts/extract_k150k_features.pyalready included them (added 2026-05-15). All other extraction scripts (chug_extract_features.py,extract_full_features.py,bvi_dvc_to_full_features.py) omit them. - All current corpus JSONL files lack populated speed-feature columns: either the columns are absent entirely, or they are present but all-NaN from a run that predated the extractor addition.
- No shipped SVM or tiny-AI ONNX model currently consumes speed features.
- No HDR VMAF model exists in the Netflix upstream tree as of 2026-05-18.
Decision¶
- Add
speed_chromaandspeed_temporalto theFULL_FEATUREStuple inai/data/feature_extractor.pyand to the_METRIC_TO_EXTRACTORlookup map, so all scripts that import from that module pick up the change. - Add speed features explicitly to
bvi_dvc_to_full_features.py's localFULL_FEATUREScopy andEXTRACTORStuple (this script does not import from the shared module). - Add speed features to
chug_extract_features.py'sFULL_FEATURESset via the shared module update (the script usesFULL_FEATURESfrom the module; no local copy to patch). - Do NOT trigger any actual re-extraction in this PR — corpus re-extraction is expensive and is the responsibility of the corpus-reextraction-assessment agent (a8f22d538ea137ac0). This PR only ensures new runs pick up the features.
- Add a coverage-gap note to the
konvid_mos_head_v1model card noting that speed features were not part of its training feature set. - Column ordering: speed features are appended at the END of the feature tuples in all scripts to preserve the parquet schema version lock described in
ai/AGENTS.md §K150K-A corpus extraction invariants.
Alternatives Considered¶
| Option | Pros | Cons |
|---|---|---|
| Add to every script individually (no shared module change) | Surgical, no shared-state risk | Drift risk — future scripts won't inherit them |
| Add only to CHUG script (highest corpus priority) | Minimal scope | BVI-DVC / Netflix scripts remain stale |
| Trigger a full corpus re-extract in this PR | Corpora immediately populated | Hours of GPU time; outside this PR's scope; re-extract agent already tasked |
| Wait for GPU twins before adding to scripts | Clean GPU/CPU parity | Blocking: GPU twins are in parallel PRs; scripts should not be gated on GPU |
Chosen option: update shared module + patch scripts individually where local copies exist. Re-extract deferred to the corpus agent.
Consequences¶
Positive
- All extraction scripts that consume
FULL_FEATURESfrom the shared module will automatically include speed features on the next run. - Future corpus runs produce populated
speed_temporal/speed_chroma_u/v/uvcolumns without requiring another script-level patch. - When the Netflix HDR model lands (or the fork trains one), the corpus extraction infrastructure is already correct.
Negative
- Existing corpora remain stale until re-extract. Training code must tolerate NaN in speed columns for backward compatibility (already the case per
ai/data/feature_extractor.pyNaN-handling contract). - The
bvi_dvc_to_full_features.pylocalFULL_FEATUREScopy drifts one update behind until someone consolidates the two copies. The comment "Keep in sync" is already present; this ADR documents the divergence.
Neutral / Follow-ups
- GPU twins for
speed_chromaandspeed_temporalare tracked in ADR-0557 (CUDA) and ADR-0558 (HIP) by parallel agents; this ADR is independent. vmaf_tiny_v1/vmaf_tiny_v1_mediumlack model cards; that gap is noted in the research digest but not addressed here (pre-dates ADR-0042; low risk since v1 is superseded by v2–v4).
References¶
docs/research/feature-coverage-audit-2026-05-18.md— full audit- ADR-0555 (cross-backend parity matrix, identified the GPU twin gap)
- ADR-0557 (speed_temporal CUDA port, parallel agent)
- ADR-0558 (speed_chroma GPU port, parallel agent)
- ADR-0346 (FR-from-NR adapter, K150K extraction context)
- ADR-0382 (K150K-A parallelism, where speed features were first added)
core/src/feature/speed.c— extractor implementation- Netflix upstream
speed_portedbranch — upstream context req: feature-coverage-audit-2026-05-18(task directive)