ADR-0577: vmaf-tune bisect decode concurrency cap and aggressive workdir cleanup¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris
- Tags:
vmaf-tune,compare,bisect,disk-space,concurrency,fork-local
Context¶
The BBB v13 1080p vmaf-tune compare run failed every CPU cell with ENOSPC (rc=228). The root cause: compare dispatches each codec bisect to a thread pool. With three codecs running concurrently, each bisect materialised the 110 GB BBB 1080p reference YUV decode independently and in parallel, producing a 330 GB peak on the 420 GB /probes volume. There was no mechanism to serialise or cap the number of concurrent reference-YUV decode operations, and the per-iteration encoded MKVs and decoded YUVs accumulated until end-of-run rather than being cleaned up immediately.
The existing ADR-0549 preflight check fires before the bisect starts; it does not protect against mid-run ENOSPC when multiple threads decode simultaneously or when scratch files accumulate across iterations.
Decision¶
Three complementary fixes ship together:
-
Decode concurrency cap (
--max-concurrent-decodes N, default 1). Athreading.Semaphoreis installed at the module level invmaftune.bisectand acquired before each reference-YUV decode insidebisect_target_vmaf. The default (serial decodes) caps peak disk usage to one reference YUV decode at a time regardless of thread-pool size. The CLI flag is wired oncompare,ladder, andtune-per-shot; operators with large--workdirvolumes can raiseNto trade peak disk for throughput. -
Aggressive workdir cleanup. After each bisect iteration completes (encode + decode + score for one CRF probe), the iteration's encoded
.mkvand decoded distorted.yuvare deleted immediately. This behaviour was already implemented in_encode_and_score; the new addition is the per-bisect cleanup of the decoded reference YUV inbisect_target_vmaf'sfinallyblock. Previously the reference YUV persisted until the bisect completed; now it is deleted as soon as the bisect for that (codec, target) pair finishes, before the next codec's bisect acquires the semaphore. -
Mid-run disk-space monitoring. Before each iteration's decode,
bisect_target_vmafre-checksshutil.disk_usage(workdir).free. The guard requires 2x the estimated YUV size to be free (the extra headroom covers the encoded MKV coexisting with the decoded YUV). On failure the bisect returnsBisectResult(ok=False, error=...)with a context string naming the codec and target VMAF, replacing the opaque ffmpeg rc=228.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Raise the /probes volume size | Simple operational fix | Does not scale; next source or next run hits the same wall | Not a code fix; symptom not cause |
| Single shared workdir per codec (share the reference YUV across all targets) | One decode per codec instead of one per (codec, target) | Complicates lifecycle; reference YUV cannot be deleted until all targets for that codec complete | The semaphore + per-bisect cleanup already achieves the same peak (one decode at a time) without shared state |
| Delete reference YUV between codec iterations only (not per bisect) | Slightly fewer deletions | Does not help when a single bisect already ENOSPC on the first decode | Too coarse |
| Reduce default thread-pool workers | Simpler | Slows encoder runs which are not disk-bound | Worsens throughput without proportional disk benefit |
Consequences¶
- Positive: Peak workdir disk usage drops from
N_codecs x yuv_size(e.g. 330 GB for 3 codecs x 110 GB BBB) toyuv_size(110 GB) at the default--max-concurrent-decodes 1. The mid-run check surfaces ENOSPC with a structuredBisectResult(ok=False)error instead of a partial- write and corrupt JSON. - Negative: With
--max-concurrent-decodes 1, reference-YUV decodes serialise across threads. On a typical 110 GB BBB source at ~2 GB/s sustained disk read this adds ~55 s per codec; on the production/probesNVMe array the wall-time impact is negligible compared to the encode and score time. - Neutral / follow-ups:
- The
--max-concurrent-decodesflag onladderis accepted but currently a no-op (ladder uses corpus sweeps, not bisect decodes). It is wired for operator consistency and future use. set_decode_semaphoreandDEFAULT_MAX_CONCURRENT_DECODESare exported fromvmaftune.bisect.__all__so library callers can configure the cap before spawning a thread pool.
References¶
- ADR-0549: workdir relocation and preflight disk-space check (the ADR-0577 mid-run check extends this to intra-run space monitoring).
req: the BBB v13 1080p compare failed every CPU cell with ENOSPC because the thread pool ran 3 concurrent bisects, each materialising the 110 GB reference YUV decode in parallel, overflowing the 420 GB /probes volume.req: fix scope:--max-concurrent-decodes NCLI flag (default 1); aggressive workdir cleanup (per-iteration MKV and YUV, per-bisect ref YUV); mid-run disk-space monitoring with 2x headroom before each decode.