ADR-0511: MCP backend probe, default allowlist, and vmaf-tune ladder --score-backend (2026-05-18)¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris
- Tags: mcp, vmaf-tune, ai, dx, bugfix
Context¶
A triage session against the vmaf-dev-mcp container surfaced three small but compounding defects on the user-facing developer surfaces:
-
MCP
list_backendsmis-reports CUDA. The historical implementation grepped the output ofvmaf --versionfor the substrings"cuda","sycl","vulkan", ... The fork'svmafbanner does not advertise compiled-in GPU backends there, so on a container wherevmaf --backend cuda -r src01_hrc00 -d src01_hrc01produces a valid 76.66783 score (5-place bit-exact vs CPU on the Netflix golden 576×324 pair) the MCP server confidently returns{"cuda": false}. Any downstream MCP client that orchestrates a cross-backend run picks the wrong arm. -
MCP default allowlist excludes the Netflix golden YUVs. The default allowlist resolved
python/test/resourcerelative to the host repo root (the fork's actual tree on disk). Inside thevmaf-dev-mcpcontainer the repo lives at/workspace/, so the canonical fixture path/workspace/python/test/resource/yuv/src01_hrc00_576x324.yuvlanded outside every allowlisted root andvmaf_scorerejected it with"path not under an allowlisted root; set VMAF_MCP_ALLOW to extend."Every container-side MCP demo therefore had to setVMAF_MCP_ALLOW=/workspace/python/test/resourcebefore the first tool call. -
vmaf-tune ladderis missing--score-backend. The sibling subcommandscompareandtune-per-shotboth accept--score-backend {cpu,cuda,sycl,vulkan,auto}and thread it down to the underlyingvmafinvocation as--backend $name.laddersilently runs the corpus sampler with whatever libvmaf chooses itself, defeating the purpose of the dev-machine CPU/CUDA/SYCL parity matrix when staging a new VMAF model on a multi-GPU host.
All three are tiny edits in isolation; the value is in shipping them together with the documentation + ADR + tests that make the guarantees explicit going forward.
Decision¶
-
Replace the
--version-grep backend probe with a--help-based_probe_backends()helper that looks for the documented--no_<backend>disable flags. The vmaf CLI always advertises a--no_<backend>line for every compiled-in backend, so presence of--no_cudain--helpoutput is a sufficient and necessary condition for CUDA being live. Cache the result per-binary-path for the lifetime of the MCP server process. -
Extend the MCP default allowlist to include the absolute container-side mount path (
/workspace/python/test/resource). The host-relative_repo_root() / "python/test/resource"entry stays so a host-sidevmaf-mcpinvocation (used bymake test) still works. TheVMAF_MCP_ALLOWenv-var override is unchanged. -
Add
--score-backend {cpu,cuda,sycl,vulkan,auto}tovmaf-tune ladderwithautoas the default (which preserves current behaviour). On any non-autovalue,_run_ladderresolves the choice viascore_backend.select_backend()up-front so an unavailable backend errors out before any encodes start, then threads the resolved value throughmake_default_sampler→CorpusOptions.score_backend→vmaf --backend $name. The sibling--vmaf-binflag is added to the same subparser to giveselect_backend()a probe target (the rest of the ladder code path already shells out tovmafby name). -
Deliberately do NOT touch
tune-per-shot. Its_build_per_shot_bisect_predicatealready convertsargs.score_backend == "auto"toNonesobisect_target_vmafreceivesNonefor auto and lets libvmaf pick the fastest live runtime. Resolving the backend up-front there would change the "auto → None → libvmaf-picks" contract that downstream tests assert. The asymmetry is intentional: the user spec for ADR-0509 explicitly called it out as a regression to avoid.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Keep --version-grep but add CUDA/SYCL substrings to the banner upstream | One-line patch to vmaf.c | Won't help users who run the previously-shipped binary; banner-grep is fragile (every backend rename breaks it again); requires upstream coordination | --help flag-presence is a stable contract already enforced by the CLI's argparse table |
Hard-code /workspace/... as the only container allowlist | Smallest patch | Breaks host-side use of the MCP server (make test, dev-host smoke runs); user would have to set VMAF_MCP_ALLOW on every invocation | Add the container path in addition to the existing host-relative roots |
Make ladder reject --score-backend as "unsupported, use compare" | Smallest patch | Diverges DX from the sibling subcommands; users have asked for it three times in MCP-probe sessions | Add the flag and thread it; the cost of the plumbing is one extra kwarg on make_default_sampler and one extra branch in _run_ladder |
Pre-resolve tune-per-shot backend up-front via select_backend too (symmetric with ladder) | API symmetry between subcommands; "fail loudly" semantics for both | Breaks the None if score_backend == "auto" else score_backend contract the predicate already relies on; would require also touching _build_per_shot_bisect_predicate and three downstream tests | User spec explicitly forbade regressing tune-per-shot; keep the existing behaviour |
Consequences¶
- Positive:
- Container MCP demos work out of the box — no
VMAF_MCP_ALLOWboilerplate for the canonical fixture path. list_backendscorrectly reports CUDA / SYCL / Vulkan / HIP / Metal on any host where the binary advertises the corresponding--no_<backend>flag. Cross-backend parity scripts no longer skip CUDA silently.vmaf-tune ladder --score-backend cudafinally honours the user's intent the same waycompareandtune-per-shotdo.- Negative:
tune-per-shotandladdernow have slightly asymmetric backend-handling semantics (ladder: fail-fast on unavailable; per-shot: defer to libvmaf at score time). The asymmetry is documented inline in_run_tune_per_shotso the next reader does not "fix" it.- Neutral / follow-ups:
- Future work: align
tune-per-shotto theselect_backendfail-fast path when we also rev the predicate contract; tracked as a follow-up indocs/state.md.
References¶
- Related: ADR-0495 — the original MCP probe-driven bug-fix cluster; this ADR is a follow-on surfaced by the same probe harness.
- Related: ADR-0127 for the unified
--backend NAMEselector this work depends on. - Related: ADR-0175 for the Vulkan backend that participates in the probe matrix.
- Source:
req— paraphrased from the 2026-05-18 triage brief that bundled the three bugs into a single fix PR.