ADR-0527: Accept pre-extracted BVI-DVC YUVs via --bvi-dir¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (Anthropic)
- Tags:
ai,corpus,cli,docs
Context¶
ai/scripts/bvi_dvc_to_full_features.py previously accepted only a single input mode: --bvi-zip, pointing at the full BVI-DVC Part 1.zip archive (~84 GiB). For each clip it streamed the .mp4 out of the zip, decoded to raw YUV, encoded a distorted side, ran libvmaf, and deleted the temporary .mp4.
The user has 192 GB of raw BVI-DVC YUV files already extracted on disk. Requiring the zip forces an unnecessary streaming-extraction step on every invocation and prevents using the pre-decoded YUV files directly. A second input mode that points at a directory removes this friction and enables faster iteration on the feature-extraction step.
Decision¶
We will add a --bvi-dir PATH argument to bvi_dvc_to_full_features.py, mutually exclusive with --bvi-zip. When --bvi-dir is supplied the script enumerates .mp4 and .yuv files in the directory using the BVI-DVC filename convention to derive resolution, framerate, bit-depth, and tier. For .yuv inputs the decode step is skipped entirely — the file is fed to libvmaf as the reference directly. For .mp4 inputs the existing decode path is reused. No files are deleted after processing. The --bvi-zip default (legacy env-var fallback) is preserved when neither flag is provided.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
--bvi-dir as implemented | Zero extraction overhead for pre-decoded YUV; MP4 inputs reuse existing decode path; backward-compatible (neither flag → zip default) | Tier is inferred from resolution rather than filename prefix, so clips with non-standard resolutions are skipped with a warning | Chosen — the four BVI-DVC canonical resolutions are a closed set; the warning + skip is the right safety valve |
Auto-scan a well-known search path (e.g. .workingdir2/bvi-dvc-extracted/) | No extra flag required | Silently wrong if the user's extraction landed somewhere else; non-obvious from --help; not composable with other input sources | Rejected — an explicit flag is clearer and safer |
Single --bvi-input flag that auto-detects zip vs. directory | Fewer flags; slightly simpler --help | Ambiguous if the path points at a directory named *.zip; makes the mutual-exclusion constraint implicit rather than explicit | Rejected — explicit add_mutually_exclusive_group() is the right argparse pattern here |
Modify --bvi-zip to also accept a directory | Minimal surface change | Semantically misleading name; breaks any caller that checks args.bvi_zip.is_file() | Rejected |
Consequences¶
- Positive: Users with pre-extracted corpora can skip the streaming extraction step entirely;
.yuvinputs also skip the decode step, saving further wall time. - Positive: The four-tier resolution map (
_RES_TO_TIER) makes the tier assignment explicit and auditable; unknown resolutions emit a clear warning. - Positive: Backward-compatible — callers that omit both flags continue to use the zip-based path via the env-var / hard-coded default.
- Negative: The
--bvi-dirpath infers the bit-depth from the filename (_Nbit_component). Files with non-standard naming (e.g. no_10bit_component) are skipped rather than probed. - Neutral / follow-ups: The
_DirEntryclass mirrorszipfile.ZipInfojust enough for the shared iteration loop; if the script gains more entry attributes (e.g. frame count) both classes will need updating.
References¶
- User direction: the user has 192 GB of raw BVI-DVC YUVs on disk and requested a
--bvi-dirmode so the zip extraction step can be skipped. - Related ADR: ADR-0310 — original BVI-DVC ingestion pipeline decision.
- Implementation PR:
feat/bvi-dvc-pre-extracted-input.