ADR-0977: core/tools input-reader safety — Y4M malloc-NULL check, YUV/Y4M size_t cast, bench GPU-state leaks¶
- Status: Accepted
- Date: 2026-05-31
- Deciders: Lusoris
- Tags:
security,bug,tools,audit
Context¶
A deep-dive audit of core/tools/ surfaces (CLI binary + bench binary + the vendored Daala YUV / Y4M readers) found three real defects:
-
y4m_input_open_implignoredmalloc()return values (lines 815-816).dst_buf = malloc(dst_buf_sz)and the conditionalaux_buf = malloc(aux_buf_sz)both stored NULL into the reader struct on OOM and the function still returned 0. The caller chain (y4m_input_open→video_input_open) then surfaced the reader as a validvideo_input. The very nextvideo_input_fetch_frame()calledfread(_y4m->dst_buf, …)withdst_buf == NULL, faulting inside libc as a SIGSEGV. Reachable by any header that demands a buffer the process can't allocate — practical viaulimit -vor against an attacker-controlled (or fuzzer-controlled) header that declares absurdly-large dimensions. -
y4m_input.c+yuv_input.ccomputeddst_buf_szinint/unsignedprecision (lines 804 / 162-174 respectively). Both files then assigned the result to asize_tfield, where the high bits had already wrapped for headers near the 32-bit ceiling.mallocthen succeeded with a wrap-to-small size and the firstfreadoverflowed the heap allocation. The 4:4:4 paths iny4m_input.calready cast tosize_t(lines 736 / 743 / 779), proving the precedent. -
vmaf_bench.c::bench_featureleakedVmafCudaState/VmafSyclStateon every success and on most error paths.cu_state/sycl_statewere declared local to their#ifdefblocks, sovmaf_*_state_freecould never be called from later exit paths. The companion routinerun_feature_collectin the same file was already fixed under the T5 state-leak audit (2026-05-30) and carries the hoist-and-free pattern explicitly in a code comment.bench_featurewas not updated at the same time.
The audit deliberately did not modify any Netflix golden-data assertion (CLAUDE.md §8) and did not weaken any test threshold or gate.
Decision¶
Apply three coordinated fixes in one PR:
- Y4M malloc-NULL check:
y4m_input_open_implnow returns -1 whenmalloc()fails, freeing any partial allocation.y4m_input_openzero-initialises the struct before calling impl so dst_buf/aux_buf are NULL on any other early-return path. - size_t-precision buffer sizing: cast pic_w / pic_h (Y4M: signed int) and width / height (YUV: unsigned) to
size_tbefore the multiplication. The intermediate arithmetic now proceeds insize_tprecision on every supported 64-bit host, matching the pattern the 4:4:4 paths iny4m_input.calready follow. - bench_feature GPU-state lifetime: hoist
cu_state/sycl_stateto function scope and route every exit through a singlebench_cleanuplabel that runs the per-state_state_freeplusvmaf_closeplusyuv_pair_closein reverse-init order.
A new regression test test_y4m_alloc_failure drives the parser against a header that declares 65535×65535 4:4:4 12-bit (≈ 25 GiB of buffer), with setrlimit(RLIMIT_AS) clamping address space below the required size — forcing the malloc inside y4m_input_open_impl to fail deterministically. Pre-fix the test fails (parser reports success then would crash on fetch); post-fix the test passes (parser rejects open with -1, no crash).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Fix Y4M malloc-NULL only, defer overflow & bench leaks | Smaller surface, easier review | Three independent surfaces share the same root-cause (insufficient defensive coding around malloc + size arithmetic in vendored input parsers); splitting fragments the fix story and re-opens two of three holes on the next pass | Bundle matches the per-CLAUDE-feedback "bigger PRs over fragmented micro-PRs" preference; all three live in core/tools/ |
| Add an upper-bound check on pic_w/pic_h in the Y4M parser | Direct defence-in-depth against the absurd-dimension case | Doesn't address the underlying int overflow; a smaller header that still overflows when multiplied by chroma+bitdepth survives | Necessary upper bounds are a separate hardening pass; the size_t cast removes the wrap regardless of what the header declares |
| Replace the vendored Daala parser entirely | Eliminates the whole class of bugs | Out of scope; the parser is upstream-mirrored and any rewrite needs its own ADR + research digest | Out of scope; this ADR is a targeted defect-fix |
// NOLINT the affected lines | Zero behavioural change | Hides a real reachable crash | Violates CLAUDE.md §12 r12 (NOLINT only for load-bearing invariants) |
Consequences¶
- Positive: A malformed (or attacker-controlled) Y4M header with absurd dimensions no longer crashes the CLI; it now surfaces a clean error and a non-zero exit. The benchmark binary no longer leaks GPU state per invocation (every
bench_featurecall previously leaked the SYCL / CUDA state struct + its underlying USM / device buffers). - Negative:
bench_featuregained agoto bench_cleanupcleanup label. The function is slightly longer. - Neutral / follow-ups: One regression test added (
test_y4m_alloc_failure, fast suite, POSIX-only). The pre-existing CLI-fuzz harness exercise increases (the new error path is now fuzz-reachable as a clean rejection rather than as a SIGSEGV).
References¶
- Source: bug-audit task brief, 2026-05-31 session.
- Companion ADR for the pattern: T5 state-leak audit comment in
core/tools/vmaf_bench.c::run_feature_collect(2026-05-30). - Related: ADR-0316 (cli_parse banned-function audit), ADR-0438 (atoi removal in vmaf_bench), test_y4m_411_oob.c (precedent for fmemopen-based parser regression tests).