vmaf_bench — micro-benchmark & validation harness¶
vmaf_bench is a fork-added binary that times individual feature extractors on pre-staged YUV data and optionally cross-validates GPU output against CPU. It is not a score-producing tool — use the vmaf CLI (cli.md) for quality assessment. vmaf_bench exists purely to:
- compare CPU vs CUDA vs SYCL vs HIP timings per feature,
- validate GPU↔CPU numerical agreement before merging a backend change,
- profile GPU shader breakdowns (SYCL only).
Snapshot benchmark JSONs produced by vmaf_bench live under testdata/ (see ../architecture/index.md) and are not Netflix goldens — they are fork-owned. Regenerate with /regen-snapshots if you intentionally moved a baseline.
Build¶
# build with CUDA + SYCL so vmaf_bench can compare all three backends
meson setup build -Denable_cuda=true -Denable_sycl=true
ninja -C build
# binary lives at: build/core/tools/vmaf_bench
vmaf_bench compiles in every build configuration; CUDA / SYCL rows are automatically omitted when the respective backend is disabled.
Test data layout¶
vmaf_bench expects a staging directory (default /tmp/vmaf_test/, override with --data-dir or VMAF_TEST_DATA):
/tmp/vmaf_test/
├── ref_576x324.yuv # 48 frames of YUV420P-8
├── dis_576x324.yuv
├── ref_640x480.yuv
├── dis_640x480.yuv
├── ref_1280x720.yuv
├── dis_1280x720.yuv
├── ref_1920x1080.yuv
├── dis_1920x1080.yuv
├── ref_3840x2160.yuv
└── dis_3840x2160.yuv
Generate from Big Buck Bunny (or any clip):
ffmpeg -i bbb.mp4 -frames:v 48 -vf scale=1920:1080 -pix_fmt yuv420p /tmp/vmaf_test/ref_1920x1080.yuv
ffmpeg -i bbb.mp4 -frames:v 48 -vf scale=1920:1080 -pix_fmt yuv420p -c:v rawvideo \
-x264-params crf=28 /tmp/vmaf_test/dis_1920x1080.yuv
Modes¶
Performance benchmark (default)¶
| Flag | Default | Notes |
|---|---|---|
--resolution WxH | all staged | Restrict to one resolution. |
--frames N | 10 | Max 48 (staged data cap). |
--bpc N | 8 | Bits per component: 8, 10, 12, 16. |
--data-dir PATH | /tmp/vmaf_test (or $VMAF_TEST_DATA) | Override stage directory. |
--gpu-only | off | Skip CPU feature runs. |
--gpu-profile | off (SYCL-only) | Emit per-shader GPU timing breakdown. |
--device N | auto | Pick GPU device by ordinal (SYCL). |
--list-devices | — | List detected SYCL devices and exit. |
Output is a per-feature, per-backend table of median ms/frame + throughput FPS.
Validation mode¶
Runs every feature through every compiled backend on the staged data and prints CPU↔GPU ULP deltas per feature. Used by /cross-backend-diff and by reviewers checking SIMD/GPU PRs.
Target: max absolute difference ≤ 2 ULP for integer features, ≤ 1e-5 relative for float features. Larger deltas are a regression and should block merge unless justified inline (.github/PULL_REQUEST_TEMPLATE.md "Cross-backend numerical results").
Example — single 1080p benchmark¶
Expected output (abbreviated):
=== VMAF Benchmark — 1920x1080, 10 frames ===
Feature CPU(scalar) CPU(AVX-512) CUDA SYCL
integer_vif 112.4 ms 5.1 ms 0.8 ms 1.1 ms
integer_adm 98.2 ms 4.7 ms 0.7 ms 0.9 ms
integer_motion 34.6 ms 1.9 ms 0.3 ms 0.4 ms
...
Example — cross-backend validation¶
Expected output:
=== Cross-backend validation — 1920x1080, 10 frames ===
Feature CPU vs CUDA CPU vs SYCL Verdict
integer_vif max |Δ| = 1 ULP max |Δ| = 1 ULP OK
integer_adm max |Δ| = 0 ULP max |Δ| = 0 ULP OK
integer_motion max |Δ| = 0 ULP max |Δ| = 0 ULP OK
psnr max |Δ| = 1e-9 max |Δ| = 1e-9 OK
Limitations¶
- Test data must be pre-staged.
vmaf_benchdoes not download anything. - Resolution list is hard-coded to
576x324,640x480,1280x720,1920x1080,3840x2160(percore/tools/vmaf_bench.c:291-293). Pass--resolution WxHto restrict to a single size; adding new resolutions requires source changes. --gpu-profilerequires a SYCL build (not wired for CUDA).- 10 / 12 / 16 bpc (
--bpc) requires matching test data — staged 8-bit YUVs are not auto-converted.
Related¶
- cli.md — the scoring CLI (
vmaf). - ../benchmarks.md — canonical fork benchmark numbers.
- ../backends/index.md — backend compile-time + runtime rules.
/cross-backend-diffskill — wrapsvmaf_bench --validatewith PR-ready formatting.