ADR-0559: Feature Coverage Audit — Add speed_chroma + speed_temporal to Extraction Scripts (HDR-model prep)¶

Status: Accepted
Date: 2026-05-18
Deciders: lusoris, Claude Code agent
Tags: ai, feature-extraction, speed, hdr, corpus, fork-local

Context¶

The cross-backend parity matrix (PR #1328 / ADR-0555) flagged speed_chroma and speed_temporal as CPU-only extractors with no GPU twin. Separately, the Netflix speed_ported upstream branch signals that these two extractors are the most likely inputs to a future Netflix HDR VMAF model. To be in a position to evaluate such a model against this fork's corpora — or to train a fork-owned HDR surrogate — the extraction scripts must include these features.

A systematic audit (see research digest docs/research/feature-coverage-audit-2026-05-18.md) found:

speed_chroma and speed_temporal exist as CPU-only extractors in core/src/feature/speed.c and are registered in feature_extractor_list[] under #if VMAF_FLOAT_FEATURES.
Only ai/scripts/extract_k150k_features.py already included them (added 2026-05-15). All other extraction scripts (chug_extract_features.py, extract_full_features.py, bvi_dvc_to_full_features.py) omit them.
All current corpus JSONL files lack populated speed-feature columns: either the columns are absent entirely, or they are present but all-NaN from a run that predated the extractor addition.
No shipped SVM or tiny-AI ONNX model currently consumes speed features.
No HDR VMAF model exists in the Netflix upstream tree as of 2026-05-18.

Decision¶

Add speed_chroma and speed_temporal to the FULL_FEATURES tuple in ai/data/feature_extractor.py and to the _METRIC_TO_EXTRACTOR lookup map, so all scripts that import from that module pick up the change.
Add speed features explicitly to bvi_dvc_to_full_features.py's local FULL_FEATURES copy and EXTRACTORS tuple (this script does not import from the shared module).
Add speed features to chug_extract_features.py's FULL_FEATURES set via the shared module update (the script uses FULL_FEATURES from the module; no local copy to patch).
Do NOT trigger any actual re-extraction in this PR — corpus re-extraction is expensive and is the responsibility of the corpus-reextraction-assessment agent (a8f22d538ea137ac0). This PR only ensures new runs pick up the features.
Add a coverage-gap note to the konvid_mos_head_v1 model card noting that speed features were not part of its training feature set.
Column ordering: speed features are appended at the END of the feature tuples in all scripts to preserve the parquet schema version lock described in ai/AGENTS.md §K150K-A corpus extraction invariants.

Alternatives Considered¶

Option	Pros	Cons
Add to every script individually (no shared module change)	Surgical, no shared-state risk	Drift risk — future scripts won't inherit them
Add only to CHUG script (highest corpus priority)	Minimal scope	BVI-DVC / Netflix scripts remain stale
Trigger a full corpus re-extract in this PR	Corpora immediately populated	Hours of GPU time; outside this PR's scope; re-extract agent already tasked
Wait for GPU twins before adding to scripts	Clean GPU/CPU parity	Blocking: GPU twins are in parallel PRs; scripts should not be gated on GPU

Chosen option: update shared module + patch scripts individually where local copies exist. Re-extract deferred to the corpus agent.

Consequences¶

Positive

All extraction scripts that consume FULL_FEATURES from the shared module will automatically include speed features on the next run.
Future corpus runs produce populated speed_temporal / speed_chroma_u/v/uv columns without requiring another script-level patch.
When the Netflix HDR model lands (or the fork trains one), the corpus extraction infrastructure is already correct.

Negative

Existing corpora remain stale until re-extract. Training code must tolerate NaN in speed columns for backward compatibility (already the case per ai/data/feature_extractor.py NaN-handling contract).
The bvi_dvc_to_full_features.py local FULL_FEATURES copy drifts one update behind until someone consolidates the two copies. The comment "Keep in sync" is already present; this ADR documents the divergence.

Neutral / Follow-ups

GPU twins for speed_chroma and speed_temporal are tracked in ADR-0557 (CUDA) and ADR-0558 (HIP) by parallel agents; this ADR is independent.
vmaf_tiny_v1 / vmaf_tiny_v1_medium lack model cards; that gap is noted in the research digest but not addressed here (pre-dates ADR-0042; low risk since v1 is superseded by v2–v4).

References¶

docs/research/feature-coverage-audit-2026-05-18.md — full audit
ADR-0555 (cross-backend parity matrix, identified the GPU twin gap)
ADR-0557 (speed_temporal CUDA port, parallel agent)
ADR-0558 (speed_chroma GPU port, parallel agent)
ADR-0346 (FR-from-NR adapter, K150K extraction context)
ADR-0382 (K150K-A parallelism, where speed features were first added)
core/src/feature/speed.c — extractor implementation
Netflix upstream speed_ported branch — upstream context
req: feature-coverage-audit-2026-05-18 (task directive)