Perf Snapshot Audit — 2026-05-29¶

Audit of all testdata/perf_*.json and testdata/netflix_benchmark_results.json snapshot files. Performed on branch research/cuda-13.3-impact-assessment-20260528 against HEAD 8ae0535f5b.

Inventory¶

File	Git status	Last committed	Binary age
`testdata/perf_benchmark_results.json`	Checked in (not gitignored)	`39455e78ed` (2026-05-28)	`build/tools/vmaf_bench` mtime 2026-05-18 — 11 days stale
`testdata/netflix_benchmark_results.json`	Checked in (not gitignored)	`39455e78ed` (2026-05-28)	Same binary
`testdata/scores_cpu_{576,640,720,1080,4k}.json`	Checked in	Same merge commit	Same binary
`testdata/scores_sycl_*.json`	Checked in	Same merge commit	Intel Arc A380/B580/UHD770
`testdata/perf_multi_resolution.json`	Checked in, but only in unmerged branch `feat/perf-bench-multi-resolution-20260529` (ADR-0752) — not on master, no open PR	`2d7ddf773c` (2026-05-29, worktree only)	RTX 4090, generated at `8930853864`

Note: testdata/perf_multi_resolution.json was referenced in the task as a PR #92 baseline but PR #92 does not exist (branch not pushed, no PR opened yet). The file lives only in the local worktree agent-a5cf516ff5285e00e.

Staleness assessment¶

`perf_benchmark_results.json`¶

Schema: Three BBB fixture entries (BBB 1080p 48f (YUV), BBB 4K 200f (YUV), BBB 4K MP4 500f (decode+vmaf)). CPU / CUDA / SYCL throughput + pooled score per entry.
Fixture availability: ref_1920x1080_48f.yuv / dis_1920x1080_48f.yuv are not present on the host (gitignored upscaled fixtures from ADR-0752). Only the BBB 4K 200f (YUV) test can run — the other two would be skipped by bench_perf.py.
Binary staleness: build/tools/vmaf_bench is from 2026-05-18. Since that date, the following score-affecting commits merged onto this branch:
37a638bdec — P0 fix: removes a committed conflict marker in integer_vif_cuda.c. CUDA VIF scores were corrupt/unavailable from that breakage until this fix. The snapshot's CUDA pooled values (e.g. 95.098706 for 1080p) were generated before the corruption was introduced or after it was fixed, but the binary currently checked in as build/tools/vmaf_bench postdates the fix (May 18 vs the fix landing in the merge train for May 28–29).
92ea978a41 — CUDA ciede __ldg() F3 fix (ADR-0762): affects CIEDE kernel cache usage. Does not change VMAF v0.6.1 pooled scores (CIEDE is a side metric).
f2e2c035f7 — CUDA ms_ssim __ldg() F3 fix (ADR-0757): changes CUDA ms_ssim kernel. Pooled VMAF unchanged; ms_ssim component may shift slightly.
fe5bcb207a — CUDA filter1d + ssim_vert_combine resolution dispatch (ADR-0753): adds per-resolution kernel variant selection. Could affect SSIM/MS-SSIM throughput.
70cb42a11b — CUDA ssim_vert_combine __ldg() + pinned-host leak fix (ADR-0754).
c1a8508cbf — CUDA ms_ssim_decimate smem tiling + adm_cm register reduction (ADR-0744): throughput improvement, scores unchanged.
Verdict: CUDA throughput numbers in perf_benchmark_results.json are stale with respect to the current codebase (several CUDA kernel optimizations landed since). The pooled VMAF scores (cpu: 95.098707, cuda: 95.098706 for 1080p) match the expected cross-backend proximity per ADR-0138/0139. The binary in build/ is newer than the snapshot's implied generation date — a re-run would be needed to confirm.

`netflix_benchmark_results.json`¶

Covers src01_576x324, checker_1080p_mild, checker_1080p_heavy across CPU / CUDA / SYCL with per-frame arrays and throughput.
Same staleness as perf_benchmark_results.json — same merge commit, same binary gap.
Per-frame VMAF values for CPU (e.g. src01 pooled = 76.667828) match the Netflix golden assertion range expected from python/test/quality_runner_test.py.
CUDA pooled (76.668903) vs CPU (76.667828): delta = 0.001075 — within the expected GPU non-bit-exact range (ADR-0138/0139 documents "close but not bit-identical").

`scores_cpu_.json` and `scores_sycl_.json`¶

Regenerated by testdata/gen_cpu_golden.py / testdata/run_sycl_scores.py respectively, both invokable via /regen-snapshots.
Same merge-commit provenance as the perf files.
No score-shifting CPU commits have landed on master since these were generated (master has only 2 commits; all feature work is in unmerged branches).

Tool functionality¶

`/regen-snapshots`¶

Covers: testdata/scores_cpu_*.json, testdata/netflix_benchmark_results.json.
Does NOT cover: testdata/perf_benchmark_results.json (that file is produced by testdata/bench_perf.py, not gen_cpu_golden.py / benchmark_netflix.py).
Status: Functional. gen_cpu_golden.py and benchmark_netflix.py exist. Binary build/tools/vmaf (CPU path) exists.

`/run-netflix-bench`¶

Invokes testdata/bench_all.sh then diffs with testdata/netflix_benchmark_results.json via testdata/compare_combined.py.
bench_all.sh exists and has the correct backend-engagement semantics (fixed in ADR-0513 sweep).
compare_combined.py exists and compares scores_cpu_*.json vs scores_sycl_a380_*.json. Note: the skill description references netflix_benchmark_results.json as the output, but compare_combined.py actually diffs the scores_cpu_* vs scores_sycl_a380_* files — a mismatch between the skill's step 3 and what the script does. The skill would need a separate comparator to diff bench_all.sh output against netflix_benchmark_results.json.
Fixture gap: testdata/ref_1920x1080_48f.yuv is missing from the host tree (gitignored per ADR-0752). The 1080p benchmark row in perf_benchmark_results.json cannot be re-verified without regenerating the upscaled fixture.

`bench_perf.py` (produces `perf_benchmark_results.json`)¶

python3 testdata/bench_perf.py --list-tests reports:
BBB 1080p 48f (YUV): missing fixtures (ref_1920x1080_48f.yuv, dis_1920x1080_48f.yuv)
BBB 4K 200f (YUV): available (testdata/bbb/ref_3840x2160_200f.yuv present)
Only the 4K row can be re-run on the current host. The 1080p row requires regenerating the upscaled fixtures first (ffmpeg -i ref_576x324_48f.yuv -vf scale=1920:1080:flags=bilinear).

`scripts/perf/bench-multi-resolution.sh` (produces `perf_multi_resolution.json`)¶

Exists only in the unmerged worktree feat/perf-bench-multi-resolution-20260529.
Not on master. No open PR.
The script and ADR-0752 are complete and ready; the branch just has not been pushed or PRed.

Issues found¶

perf_benchmark_results.json not covered by /regen-snapshots — the skill references gen_cpu_golden.py and benchmark_netflix.py but has no step for bench_perf.py. The skill's --files parameter list should include an entry for perf_benchmark_results.json with the correct regeneration command.
/run-netflix-bench skill description vs compare_combined.py mismatch — step 3 of the skill says to compare bench_all.sh output against netflix_benchmark_results.json, but compare_combined.py diffs scores_cpu_*.json vs scores_sycl_a380_*.json (no comparison against netflix_benchmark_results.json). The skill needs either a dedicated comparator or a corrected step description.
perf_multi_resolution.json branch not pushed — the ADR-0752 work is complete in the local worktree but has not been pushed or PR'd. The branch feat/perf-bench-multi-resolution-20260529 should be pushed and a PR opened.
CUDA throughput numbers are stale — multiple CUDA kernel optimizations landed after the snapshot was generated. A re-run after a fresh ninja -C build -Denable_cuda=true build would update throughput numbers while leaving pooled scores unchanged (the optimizations are throughput-only, not correctness changes).
Missing schema_version key in perf_benchmark_results.json and netflix_benchmark_results.json — unlike perf_multi_resolution.json (which carries schema_version: 1 per ADR-0752), the legacy files have no version field, making future format evolution harder to detect.

Recommendations (not actioned here — audit only)¶

Add testdata/perf_benchmark_results.json to the /regen-snapshots skill with bench_perf.py as the generator.
Fix the /run-netflix-bench skill step 3 to match what compare_combined.py actually does.
Push feat/perf-bench-multi-resolution-20260529 and open PR for ADR-0752.
Regenerate perf_benchmark_results.json (4K row only, with justification) once the CUDA kernel optimization PRs are merged to master and a fresh build is ready.
Add schema_version: 1 to the legacy JSON files on next regeneration.

Perf Snapshot Audit — 2026-05-29¶

Inventory¶

Staleness assessment¶

perf_benchmark_results.json¶

netflix_benchmark_results.json¶

scores_cpu_*.json and scores_sycl_*.json¶