ADR-0431: Split CUDA and CPU Feature Passes for FR-from-NR Extraction¶
- Status: Accepted
- Date: 2026-05-15
- Deciders: Lusoris, Codex
- Tags: ai, cuda, training-data, corpus, fork-local
Context¶
ai/scripts/extract_k150k_features.py is the local FR-from-NR extractor used for K150K and ad-hoc CHUG FULL_FEATURES materialisation. CHUG experiments exposed that the current all-feature --backend cuda invocation can fail on 10-bit clips with duplicate feature-key warnings followed by context could not be synchronized.
The CUDA binary does expose twins for most requested FULL_FEATURES, so the failure is not a missing-feature problem. The unstable path is the generic feature-name bundle plus backend auto-selection in a single libvmaf context.
Decision¶
We will split CUDA extraction into two explicit passes. The first pass uses explicit CUDA extractor names for the stable CUDA-backed feature set; the second pass runs residual CPU extractors (float_ssim, cambi) through --cpu-vmaf-bin; the script merges per-frame metric dictionaries before aggregation so the parquet schema remains unchanged.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Keep one generic --backend cuda invocation | Fastest command shape, no script complexity | Reproduces CHUG failures on 10-bit clips and loses rows | Rejected because the current local HDR run cannot complete reliably |
| Force CPU-only extraction | Stable and already documented | Leaves CUDA hardware idle and makes CHUG/K150K passes much slower | Rejected because explicit CUDA feature names work for most of the bundle |
| Add feature-specific retry after failure | Maximises fallback coverage | Wastes time failing first and makes progress accounting noisy | Rejected in favour of deterministic split routing |
Consequences¶
- Positive: CHUG and K150K local extraction can use CUDA where it is stable without losing the existing 22-feature output schema.
- Negative: CUDA mode launches two libvmaf subprocesses per clip and needs both
--vmaf-binand--cpu-vmaf-binto exist locally. - Neutral / follow-ups: CHUG and K150K feature materialisation should be deduplicated behind a shared FR-from-NR helper so backend routing is not tied to one corpus-specific script name.