ADR-0912: Pixel-format edge coverage at the libvmaf unit-test layer¶
- Status: Accepted
- Date: 2026-05-31
- Deciders: Lusoris, Claude (Anthropic)
- Tags: test, coverage, fork-local, pixel-format, hbd
Context¶
The CPU extractor surface ships smoke / regression tests under core/test/test_*.c for every registered extractor, but a sweep on 2026-05-31 found that nearly all of them allocate VmafPicture instances at (VMAF_PIX_FMT_YUV420P, 8) — the Netflix golden gate's common case. The non-default pixel-format and bit-depth combinations are exercised only at three places:
test_picture.c— pure allocation-shape checks for YUV422P / YUV444P 8-bit; does not run an extractor.test_pic_preallocation.c::test_picture_pool_yuv444— runs the full VMAF model on YUV444P 8-bit, but does not isolate a single extractor and so cannot localise a chroma-stride regression.test_psnr.c::test_16b_large_diff— runspsnr_hbdon a 2×2 YUV420P 16-bit input by#include-inginteger_psnr.cdirectly, bypassing the public extractor surface.
Net effect: PSNR / SSIM / CIEDE on 4:2:2 input, PSNR / SSIM on 10/12-bit input, and CIEDE's scale_chroma_planes upscale path on any non-444 input had zero end-to-end test coverage. Bugs in those paths would only be caught by the (slow) Python harness or the cross-backend parity gate.
The chroma-stride and bit-depth-shift code paths are where bugs historically hide:
- The Research-0094 / commit
b4f7a96efix topicture.c::picture_compute_geometrycorrected a floor→ceiling division for odd-dimension YUV420P chroma planes that had silently under-allocated by one row for years. - ADR-0186 / Vulkan PSNR chroma-geometry derived the same ceiling formula independently;
test_psnr_vulkan_chroma_geom.cpins the math in isolation but not against an extractor. - The CIEDE
init()chroma-upscale scratch allocation (a pair of full YUV444P pictures + 6 aligned float scratch buffers) is taken on every non-444 input but has no smoke beyond CI's integration paths.
Decision¶
Add core/test/test_pixel_format_edge_coverage.c, a single file holding five end-to-end smoke tests that run the public extractor surface (vmaf_get_feature_extractor_by_name + vmaf_feature_extractor_context_create / _extract / _close / _destroy, vmaf_feature_collector_get_score) against identical ref / dist pairs at the high-value gap intersections:
| test | extractor | pix_fmt | bpc | exercises |
|---|---|---|---|---|
psnr_yuv422p_8bit_identical | psnr | 4:2:2 | 8 | ss_hor=1, ss_ver=0 8-bit chroma stride |
psnr_yuv444p_10bit_identical | psnr | 4:4:4 | 10 | full-resolution HBD chroma scoring |
psnr_yuv420p_12bit_identical | psnr | 4:2:0 | 12 | 12-bit peak math (no other 12-bit test) |
ssim_yuv422p_8bit_identical | ssim | 4:2:2 | 8 | integer_ssim 4:2:2 init / decimation |
ciede_yuv422p_8bit_identical | ciede | 4:2:2 | 8 | scratch 4:4:4 scale_chroma_planes path |
Each test allocates 320×240 identical-pair input, fills both pictures with the same deterministic rolling pattern (defensive against any future extractor that special-cases flat input), runs one extract pass, and asserts the per-plane PSNR equals 6 * bpc + 12 (the default psnr_max ceiling for mse=0) / SSIM equals 1.0 / CIEDE2000 score is positive (identical-pair de00=0 → score=+inf via 45 - 20*log10(de00)). The numeric expectations are not the point — they are enforced more rigorously by the Netflix golden gate. The point is that init / extract complete without an -EINVAL / -ENOMEM / OOB-read fault for these combinations.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Status quo (no extra coverage) | zero effort | hides chroma-stride / HBD regressions until Python harness or cross-backend gate catches them | rejected — defeats the purpose of the unit-test layer |
Extend test_psnr.c / test_ciede.c / etc. in place | minimum file count | mixes scopes (existing files exercise inner-helper math via #include); existing tests are organised by extractor not by pixel-format axis | rejected — keeping the new file scoped to the cross-cutting axis makes the gap visible and avoids inflating the existing tests' line count past the readability-function-size threshold |
| One file per (extractor × pix_fmt) combination | maximum locality | 5+ near-duplicate files for 5 cases; each one would need a copy of the alloc+pattern+extract scaffold | rejected — duplication outweighs locality at this scale |
| Generate YUV422P / YUV444P / 10-bit YUV fixtures via ffmpeg | reuses existing benchmark patterns | adds ~5 MB of binary fixtures, requires ffmpeg at build time, and the tests still need a pattern fill to defeat all-zero short-circuits | rejected — synthetic in-memory patterns are deterministic, sub-millisecond, and bit-exact across architectures |
Consequences¶
- Positive: every CPU extractor that emits per-plane chroma scores (PSNR, SSIM) and the only extractor with a pixel-format-conditional init path (CIEDE) now has at least one fast-suite test on YUV422P input; PSNR also has one test at every supported bit depth (8 / 10 / 12; 16-bit was already covered by
test_psnr.c::test_16b_large_diff). - Positive: pre-push gate (
meson test -C build --suite=fast) catches any future regression in the chroma-stride (picture_compute_geometry) or HBD scoring loops within ~50 ms. - Negative: adds one more test executable to the fast suite (~40 ms wall-clock for all five cases combined, dominated by picture allocation).
- Neutral / follow-ups: the matrix can be extended over time — MS-SSIM, ADM, VIF, motion, psnr_hvs at non-default formats are still uncovered. Each is a small follow-up bundle once the pattern in this file is in tree.
References¶
- See ADR-0108 for the six deliverables rule.
- See ADR-0186 for the Vulkan PSNR chroma-geometry derivation that motivates the same ceiling-division concern on the CPU path.
- See
core/src/picture.c(picture_compute_geometry) for the chroma plane dimension formula exercised by every test in this file. - See
core/src/feature/ciede.c(init,extract) for the YUV444P fast-path /scale_chroma_planesupscale branch exercised by the ciede 4:2:2 test. - Source: user request 2026-05-31 ("audit test coverage for non-default pixel formats: 4:2:2, 4:4:4, 10-bit, 12-bit, 16-bit; add focused tests for high-value uncovered combinations").