ADR-0517: Repair MCP run_benchmark tool — drop per-call args, inject VMAF_BIN, guard set -u in bench_all.sh¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude Sonnet 4.6
- Tags: mcp, bugfix, fork-local, benchmark
Context¶
The run_benchmark MCP tool consistently returned an error response with no benchmark output. The E2E test matrix (v9, 2026-05-18, item 5e) captured the symptom:
bench_all.sh exited 1 — likely aborted by set -euo pipefail
Three independent root causes were found:
Root cause 1 — spurious positional arguments corrupt $@ inside sourced setvars.sh
_run_benchmark() called bench_all.sh with -r ref -d dis --width W --height H positional arguments. bench_all.sh uses set -euo pipefail and sources Intel oneAPI setvars.sh inside its body. setvars.sh saves $@ at the top of its body and interprets it as its own flag set; the unknown flags (-r, -d, --width, --height) propagated into each per-component env/vars.sh script, which either hung waiting for input or exited non-zero. Because the source call was inside a set -euo pipefail context, the first component failure silently aborted bench_all.sh before any echo had fired — producing empty stdout and stderr and exit code 1.
Root cause 2 — VMAF_BIN not injected into subprocess environment
bench_all.sh defaults VMAF to core/build/tools/vmaf (a relative in-tree path) when VMAF_BIN is not set. In the vmaf-dev-mcp container the binary is installed at /usr/local/bin/vmaf and the in-tree path does not exist. _run_benchmark() computed the correct binary via _vmaf_binary() but did not export VMAF_BIN to the subprocess, so bench_all.sh always attempted the missing in-tree path.
Root cause 3 — set -u (nounset) aborts when setvars.sh references unset variables
Even after removing the spurious args, source setvars.sh >/dev/null 2>&1 || true still caused a silent abort. The || true guards against a non-zero return, but set -u in the calling shell aborts immediately when the sourced script references an unset variable (e.g., SETVARS_ARGS, ia32). This exit is not a return-code event; it terminates the shell process directly and bypasses || true. The fix is to temporarily disable -u with set +u around the source call, then restore it with set -u after.
Root cause 4 — OUTDIR points to a relative path that does not exist in worktrees
bench_all.sh creates its output JSON files at testdata/bbb/results/ (relative to VMAF_ROOT). When the MCP server is installed as an editable package from a git worktree, _repo_root() resolves to the worktree directory, which shares the git objects but does not have the large YUV fixtures or the testdata/bbb/ directory. Additionally, in the vmaf-dev-mcp container, /workspace is mounted read-only, so writing to the workspace tree is not possible. The fix is to default OUTDIR to /tmp/vmaf-bench-$$ (writable) and let callers override via VMAF_BENCH_OUTDIR.
Decision¶
We apply four targeted fixes:
-
Remove per-call arguments from
run_benchmark: the MCP tool schema no longer acceptsref,dis,width,height. The tool takes no arguments.bench_all.shis a fixed-fixture suite; per-pair scoring already has thevmaf_scoretool. -
Inject
VMAF_BINinto the subprocess environment:_run_benchmark()addsVMAF_BIN: str(_vmaf_binary())to the environment dict sobench_all.shuses the discovered binary path regardless of its own default. -
Add
set +u/set -uguard aroundsource setvars.shinbench_all.sh: this isolates any nounset references insidesetvars.shand its sub-scripts from the outerset -euo pipefailcontext. -
Default
OUTDIRto/tmp/vmaf-bench-$$: writable on every platform and in every container configuration. Callers who want persistent artefacts may setVMAF_BENCH_OUTDIR._run_benchmark()also adds data-root discovery that prefers a root that actually contains the fixture YUVs (checked via a canary file probe).
Additionally, run() in bench_all.sh is hardened to treat missing output files as SKIP (not abort), so unavailable backends (Vulkan without ICD, HIP scaffold-only) do not kill the whole harness under set -euo pipefail.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Parse -r/-d in bench_all.sh with getopts | Keeps per-pair flexibility | bench_all.sh is a full-suite runner; per-pair scoring is vmaf_score; significant script complexity | Unnecessary complexity — not the right tool |
Source setvars.sh in a subshell ( source ... ) | Isolates exit calls completely | Env vars set by setvars.sh (oneAPI paths) are lost to the parent; breaks SYCL runs that rely on oneAPI PATH | Defeats the purpose of sourcing |
| Skip setvars.sh entirely when running as MCP subprocess | Simple | Breaks SYCL runs that genuinely need oneAPI env setup | Not robust across all backends |
| Set SETVARS_COMPLETED=1 to skip re-sourcing | Documented setvars.sh flag | Still triggers the prep_for_exit 3; return path which returns 3 (non-zero), and -u abort on that path is also possible | Fragile; -u problem not solved |
Consequences¶
Positive:
run_benchmarkreturns a complete benchmark JSON response with per-backend VMAF scores.- Backends that are unavailable (Vulkan without ICD) produce a SKIP line rather than aborting.
- The fix is backward-compatible: existing calls with
{}arguments work without changes.
Negative:
- Callers who previously passed
ref/dis/width/heighttorun_benchmarkwill get a validation error ("unknown argument"). This is acceptable: those callers were receiving an error response anyway, so no regression in functionality. - The output JSONs are now ephemeral (
/tmp/vmaf-bench-$$) rather than persistent attestdata/bbb/results/. Callers that wanted to read the per-backend JSON artefacts after the fact must setVMAF_BENCH_OUTDIR.
Neutral follow-ups:
- The vmaf binary's "could not open file" on Vulkan fallback (exit 0, no JSON written) is a pre-existing issue not introduced by this fix. Tracked separately.
docs/mcp/run_benchmark.md(new, this PR) documents the corrected tool surface.
References¶
- req: "Fix MCP server tool
run_benchmarkwhich currently fails with bench_all.sh exited 1" (user task 2026-05-18) - E2E Test Matrix v9 Finding 5e:
run_benchmarkFAIL —.workingdir/bbb_reports/E2E_TEST_MATRIX_v9.md - ADR-0009 — MCP tool surface governance
- Intel oneAPI
setvars.shset -uinteraction: confirmed bybash -xtrace in container (2026-05-18)