Research digest 0054 — libFuzzer scaffold for the Y4M parser¶
- Date: 2026-05-03
- Author: Claude (Opus 4.7, 1M context) on behalf of lusoris
- Companion ADR: ADR-0270
- Companion PR: see CHANGELOG
### Addedrow below
Question¶
The OSSF Scorecard Fuzzing check is at 0/10 on this fork. What is the smallest patch that takes it off zero, and which surface should be the first harness?
Method¶
- Surveyed the libvmaf attack surface for parser-shaped inputs — anything that takes attacker-controllable bytes and runs
sscanf/memcpy/ size-derivedmallocagainst them. Three candidates fell out:core/tools/y4m_input.c— vendored Daala YUV4MPEG2 parser.core/src/read_json_model.c— JSON model deserialiser.core/tools/yuv_input.c— raw-YUV opener (driven by caller-suppliedwidth,height,pix_fmt— fewer header-parsed code paths).
- Read the OSSF Scorecard
Fuzzingdefinition: any committed fuzz target counts (libFuzzer, AFL, OSS-Fuzz, nativego test -fuzz, etc.). The grading is binary at the target- present threshold. - Compared the in-tree fuzz scaffolding patterns from peer C parsers — picked the libFuzzer pattern over AFL because clang ships libFuzzer by default and the build-host tooling burden is zero.
- Sized the harness:
fmemopenwraps the fuzzer's input bytes as aFILE *, the publicvideo_input_open/video_input_fetch_frame/video_input_closetriple drives the parser end-to-end without exposing the static-onlyy4m_input_open_impl. - Wrote a 60-second smoke run on the seed corpus to validate the build before opening the PR.
Findings¶
- Y4M is the right first target. The JSON model loader is also parser-shaped, but the model files it reads are normally shipped alongside the binary, not received from the network or user input. The Y4M parser is invoked on
vmaf --reference path.y4m— a path under attacker control if the attacker can write to a watched directory, drop a hostile file in a CI artefact share, or get a transcoding pipeline to feed the binary a crafted file. - The Y4M parser is opted out of two SEI CERT rules at file scope (
bugprone-unchecked-string-to-number-conversion,cert-err34-c) via a NOLINTBEGIN/NOLINTEND pair. The suppression preserves upstream verbatim text but it also papers over exactly the rule a fuzzer is best at exercising (sscanf returns < n on overflow, etc.). The harness gives us the coverage gate the lint rule was disabled to allow. - libFuzzer's instrumentation flags (
-fsanitize=fuzzer) conflict withb_lundef=trueon clang (the warning is emitted at meson-setup time). The fuzz-only build setsb_lundef=falseand that is documented in the fuzz README. - The 60-second smoke run on the hand-crafted seed corpus (six seeds: 420, 422, 420p10, mono, 411, empty) hit a real heap-buffer-overflow within the first few seconds:
y4m_convert_411_422jpegwrites_dst[1]unconditionally in its first sub-loop whenOC_MINI(c_w, 1) == 1and the destination chroma widthdst_c_w == 1. The third sub-loop guards the same write withif ((x << 1 | 1) < dst_c_w) {…}; the first does not. Reproducer:YUV4MPEG2 W2 H4 F30:1 Ip C411\nfollowed by aFRAME\nheader and ~120 bytes of payload (full reproducer atcore/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m).
Conclusion¶
Land the libFuzzer scaffold + the Y4M harness now. Park the 411-chroma reproducer as a known-crash artefact (excluded from nightly CI so the gate doesn't crash on PR-1) and open a follow-up PR that ports the third-loop's dst_c_w guard to the first and second sub-loops, then re-enables the reproducer as a seed under y4m_input_corpus/.
Smoke-test command¶
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
mkdir -p /tmp/fuzz-smoke-y4m
./build-fuzz/test/fuzz/fuzz_y4m_input \
-max_total_time=60 \
-rss_limit_mb=2048 \
-malloc_limit_mb=1024 \
-timeout=10 \
/tmp/fuzz-smoke-y4m \
core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer triggers the same bug:
./build-fuzz/test/fuzz/fuzz_y4m_input \
core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m
The first run reaches Done <N> runs in 60 second(s) with no crashes once the 411-OOB fix lands; the second always reproduces the OOB until then.
References¶
- ADR-0270 (decision matrix + alternatives).
- OSSF Scorecard
Fuzzingcheck. - libFuzzer guide (LLVM).