Perf Snapshot Audit — 2026-05-29¶
Audit of all testdata/perf_*.json and testdata/netflix_benchmark_results.json snapshot files. Performed on branch research/cuda-13.3-impact-assessment-20260528 against HEAD 8ae0535f5b.
Inventory¶
| File | Git status | Last committed | Binary age |
|---|---|---|---|
testdata/perf_benchmark_results.json | Checked in (not gitignored) | 39455e78ed (2026-05-28) | build/tools/vmaf_bench mtime 2026-05-18 — 11 days stale |
testdata/netflix_benchmark_results.json | Checked in (not gitignored) | 39455e78ed (2026-05-28) | Same binary |
testdata/scores_cpu_{576,640,720,1080,4k}.json | Checked in | Same merge commit | Same binary |
testdata/scores_sycl_*.json | Checked in | Same merge commit | Intel Arc A380/B580/UHD770 |
testdata/perf_multi_resolution.json | Checked in, but only in unmerged branch feat/perf-bench-multi-resolution-20260529 (ADR-0752) — not on master, no open PR | 2d7ddf773c (2026-05-29, worktree only) | RTX 4090, generated at 8930853864 |
Note: testdata/perf_multi_resolution.json was referenced in the task as a PR #92 baseline but PR #92 does not exist (branch not pushed, no PR opened yet). The file lives only in the local worktree agent-a5cf516ff5285e00e.
Staleness assessment¶
perf_benchmark_results.json¶
- Schema: Three BBB fixture entries (
BBB 1080p 48f (YUV),BBB 4K 200f (YUV),BBB 4K MP4 500f (decode+vmaf)). CPU / CUDA / SYCL throughput + pooled score per entry. - Fixture availability:
ref_1920x1080_48f.yuv/dis_1920x1080_48f.yuvare not present on the host (gitignored upscaled fixtures from ADR-0752). Only theBBB 4K 200f (YUV)test can run — the other two would be skipped bybench_perf.py. - Binary staleness:
build/tools/vmaf_benchis from 2026-05-18. Since that date, the following score-affecting commits merged onto this branch: 37a638bdec— P0 fix: removes a committed conflict marker ininteger_vif_cuda.c. CUDA VIF scores were corrupt/unavailable from that breakage until this fix. The snapshot's CUDA pooled values (e.g. 95.098706 for 1080p) were generated before the corruption was introduced or after it was fixed, but the binary currently checked in asbuild/tools/vmaf_benchpostdates the fix (May 18 vs the fix landing in the merge train for May 28–29).92ea978a41— CUDA ciede__ldg()F3 fix (ADR-0762): affects CIEDE kernel cache usage. Does not change VMAF v0.6.1 pooled scores (CIEDE is a side metric).f2e2c035f7— CUDA ms_ssim__ldg()F3 fix (ADR-0757): changes CUDA ms_ssim kernel. Pooled VMAF unchanged; ms_ssim component may shift slightly.fe5bcb207a— CUDA filter1d + ssim_vert_combine resolution dispatch (ADR-0753): adds per-resolution kernel variant selection. Could affect SSIM/MS-SSIM throughput.70cb42a11b— CUDA ssim_vert_combine__ldg()+ pinned-host leak fix (ADR-0754).c1a8508cbf— CUDA ms_ssim_decimate smem tiling + adm_cm register reduction (ADR-0744): throughput improvement, scores unchanged.- Verdict: CUDA throughput numbers in
perf_benchmark_results.jsonare stale with respect to the current codebase (several CUDA kernel optimizations landed since). The pooled VMAF scores (cpu: 95.098707, cuda: 95.098706 for 1080p) match the expected cross-backend proximity per ADR-0138/0139. The binary inbuild/is newer than the snapshot's implied generation date — a re-run would be needed to confirm.
netflix_benchmark_results.json¶
- Covers
src01_576x324,checker_1080p_mild,checker_1080p_heavyacross CPU / CUDA / SYCL with per-frame arrays and throughput. - Same staleness as
perf_benchmark_results.json— same merge commit, same binary gap. - Per-frame VMAF values for CPU (e.g. src01 pooled = 76.667828) match the Netflix golden assertion range expected from
python/test/quality_runner_test.py. - CUDA pooled (76.668903) vs CPU (76.667828): delta = 0.001075 — within the expected GPU non-bit-exact range (ADR-0138/0139 documents "close but not bit-identical").
scores_cpu_*.json and scores_sycl_*.json¶
- Regenerated by
testdata/gen_cpu_golden.py/testdata/run_sycl_scores.pyrespectively, both invokable via/regen-snapshots. - Same merge-commit provenance as the perf files.
- No score-shifting CPU commits have landed on master since these were generated (master has only 2 commits; all feature work is in unmerged branches).
Tool functionality¶
/regen-snapshots¶
- Covers:
testdata/scores_cpu_*.json,testdata/netflix_benchmark_results.json. - Does NOT cover:
testdata/perf_benchmark_results.json(that file is produced bytestdata/bench_perf.py, notgen_cpu_golden.py/benchmark_netflix.py). - Status: Functional.
gen_cpu_golden.pyandbenchmark_netflix.pyexist. Binarybuild/tools/vmaf(CPU path) exists.
/run-netflix-bench¶
- Invokes
testdata/bench_all.shthen diffs withtestdata/netflix_benchmark_results.jsonviatestdata/compare_combined.py. bench_all.shexists and has the correct backend-engagement semantics (fixed in ADR-0513 sweep).compare_combined.pyexists and comparesscores_cpu_*.jsonvsscores_sycl_a380_*.json. Note: the skill description referencesnetflix_benchmark_results.jsonas the output, butcompare_combined.pyactually diffs thescores_cpu_*vsscores_sycl_a380_*files — a mismatch between the skill's step 3 and what the script does. The skill would need a separate comparator to diffbench_all.shoutput againstnetflix_benchmark_results.json.- Fixture gap:
testdata/ref_1920x1080_48f.yuvis missing from the host tree (gitignored per ADR-0752). The 1080p benchmark row inperf_benchmark_results.jsoncannot be re-verified without regenerating the upscaled fixture.
bench_perf.py (produces perf_benchmark_results.json)¶
python3 testdata/bench_perf.py --list-testsreports:BBB 1080p 48f (YUV): missing fixtures (ref_1920x1080_48f.yuv,dis_1920x1080_48f.yuv)BBB 4K 200f (YUV): available (testdata/bbb/ref_3840x2160_200f.yuvpresent)- Only the 4K row can be re-run on the current host. The 1080p row requires regenerating the upscaled fixtures first (
ffmpeg -i ref_576x324_48f.yuv -vf scale=1920:1080:flags=bilinear).
scripts/perf/bench-multi-resolution.sh (produces perf_multi_resolution.json)¶
- Exists only in the unmerged worktree
feat/perf-bench-multi-resolution-20260529. - Not on master. No open PR.
- The script and ADR-0752 are complete and ready; the branch just has not been pushed or PRed.
Issues found¶
-
perf_benchmark_results.jsonnot covered by/regen-snapshots— the skill referencesgen_cpu_golden.pyandbenchmark_netflix.pybut has no step forbench_perf.py. The skill's--filesparameter list should include an entry forperf_benchmark_results.jsonwith the correct regeneration command. -
/run-netflix-benchskill description vscompare_combined.pymismatch — step 3 of the skill says to comparebench_all.shoutput againstnetflix_benchmark_results.json, butcompare_combined.pydiffsscores_cpu_*.jsonvsscores_sycl_a380_*.json(no comparison againstnetflix_benchmark_results.json). The skill needs either a dedicated comparator or a corrected step description. -
perf_multi_resolution.jsonbranch not pushed — the ADR-0752 work is complete in the local worktree but has not been pushed or PR'd. The branchfeat/perf-bench-multi-resolution-20260529should be pushed and a PR opened. -
CUDA throughput numbers are stale — multiple CUDA kernel optimizations landed after the snapshot was generated. A re-run after a fresh
ninja -C build -Denable_cuda=truebuild would update throughput numbers while leaving pooled scores unchanged (the optimizations are throughput-only, not correctness changes). -
Missing
schema_versionkey inperf_benchmark_results.jsonandnetflix_benchmark_results.json— unlikeperf_multi_resolution.json(which carriesschema_version: 1per ADR-0752), the legacy files have no version field, making future format evolution harder to detect.
Recommendations (not actioned here — audit only)¶
- Add
testdata/perf_benchmark_results.jsonto the/regen-snapshotsskill withbench_perf.pyas the generator. - Fix the
/run-netflix-benchskill step 3 to match whatcompare_combined.pyactually does. - Push
feat/perf-bench-multi-resolution-20260529and open PR for ADR-0752. - Regenerate
perf_benchmark_results.json(4K row only, with justification) once the CUDA kernel optimization PRs are merged to master and a fresh build is ready. - Add
schema_version: 1to the legacy JSON files on next regeneration.