ADR-0634: MCP P0 fixes — isError spec bug, probe_backend, vmaf_version, vmaf_score_encoded¶
- Status: Accepted
- Date: 2026-05-19
- Deciders: lusoris, Claude (Anthropic)
- Tags:
mcp,bugfix,spec-correctness,fork-local
Context¶
The MCP capability audit of 2026-05-19 (docs/research/mcp-capability-audit-2026-05-19.md) identified four P0 (operator-blocking) findings in the Python standalone MCP server (mcp-server/vmaf-mcp/src/vmaf_mcp/server.py):
C-P0-1 / E-1 — isError=False spec-correctness bug (server.py:983-984). The _call_tool handler caught all exceptions and returned [TextContent({"error": ...})] without setting isError=True on the CallToolResult. Conformant MCP clients (mcp/client/session.py:L394) branch on result.isError — with the old code they silently treated every tool error as a success, then failed downstream trying to parse {"error": "..."} as a VMAF score result. The MCP 2024-11 spec designates isError=True as the canonical signal for tool-level errors.
C-P0-1 (tool gap) — No probe_backend tool. list_backends reported compile-in status only. On the dev-mcp container, backends have historically been compiled in but non-functional at runtime (SYCL NEO version mismatch, HIP KFD ioctl failure, Vulkan ICD ordering). An operator asking "is CUDA healthy right now?" got true from list_backends even when the driver was absent.
C-P0-3 — No vmaf_version tool. An operator debugging a score regression had no way to confirm which fork version was running or which backends were compiled in without calling multiple tools and correlating out of band.
C-P0-2 — vmaf_score accepted only raw YUV. Real operator workflows use MP4/MKV/Y4M files. No MCP tool decoded encoded video before scoring. Operators had to pre-decode manually or use the CLI directly.
Additionally, two documentation correctness issues from the audit:
README.md:L12listed only 4 backends (omitting vulkan, metal).tools.md:L170described the--version-grep probe, which was replaced with the--help-flag probe in ADR-0511.
Decision¶
Four fixes shipped in one PR:
-
isErrorfix: remove thetry/exceptcatch in_call_tooland let exceptions propagate. Themcplibrary's outer handler (_make_error_result) then correctly setsisError=True. This is the only spec-compliant approach that works with all conformant MCP clients without monkey-patching the library internals. -
probe_backendtool: new async function_probe_backend(backend)that writes a 32×32 mid-grey synthetic 4:2:0 8-bit YUV pair to aTemporaryDirectoryand runs a 1-frame vmaf score against it. Returns{compiled_in, runtime_healthy, latency_ms, score, error}. Unhealthy backends return a success-shaped result (not a raise) because "backend not functional" is an expected operator query result, not a protocol error. -
vmaf_versiontool: new synchronous function_vmaf_version()that runsvmaf --versionto extract the version string and delegates to_probe_backends()(the cached--helpprobe) for build flags. Returns{binary_path, version, build_flags, error}. -
vmaf_score_encodedtool: new async function_run_vmaf_score_encoded(ref, dis, ...)that ffprobes the reference stream for geometry, decodes both inputs in parallel to aTemporaryDirectoryviaffmpeg -f rawvideo, then calls_run_vmaf_score. Returns the standardvmaf_scorepayload plusreference_encodedanddistorted_encodedfields. Geometry detection covers 8/10/12-bit YUV 4:2:0, 4:2:2, and 4:4:4.
Two stale documentation corrections:
README.mdtools table updated to all ten tools and six backends.tools.mdlist_backendsdescription corrected from--versionto--helpprobe; response body updated to show all six backends;isErrorconventions updated to reflect the new raise-on-error behavior.
Pre-existing test test_call_tool_unknown_name_returns_error_json (which asserted the old catch-and-return behavior) updated to test_call_tool_unknown_name_raises asserting the new raise behavior. test_list_tools_returns_expected_names updated from 7 to 10 tools.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Return CallToolResult(content=..., isError=True) from the catch block | Keeps the old error-shape for backward compat | Requires importing CallToolResult from mcp.types; still catches and re-wraps; doesn't work if the mcp library version changes its internal result type | Re-raising is simpler, idiomatic, and is the approach documented in the mcp library's own examples (_make_error_result is the intended mechanism) |
Keep probe_backend returning a plain error TextContent on failure | Consistent with old error shape | Doesn't help: runtime_healthy: false is not an error — it's a valid health answer | N/A — the distinction between "probe ran, backend unhealthy" and "probe itself failed" is captured in runtime_healthy vs error field |
Use subprocess process substitution instead of temp YUV for vmaf_score_encoded | No disk write | Requires shell=True or a FIFO, introduces OS portability issues | Temp file in TemporaryDirectory is safe, cross-platform, and already the pattern used by _run_vmaf_score for its output JSON |
Consequences¶
- Positive: Conformant MCP clients (Claude Desktop, Cursor, any client that branches on
result.isError) now correctly distinguish tool errors from successes.probe_backendgives operators a runtime health signal before long runs.vmaf_versionenables build identification.vmaf_score_encodedcovers the most common real-world VMAF input format. - Negative: The
isErrorchange is a breaking behavior change for any caller that currently catches the{"error": ...}TextContent and treatsisError=Falseerrors as data. Such callers were already broken (treating errors as successes); this change makes the breakage visible. No such callers exist in the current test suite. - Neutral / follow-ups:
vmaf_score_encodedrequiresffmpeg+ffprobeon PATH; this is documented. Thesubsampleparameter is accepted but not yet wired to a vmaf--frame_stepflag (the CLI may not support it for all backends) — the parameter is validated and stored but has no effect in this PR; a follow-up can wire it when needed.
References¶
docs/research/mcp-capability-audit-2026-05-19.md— source audit, findings C-P0-1, C-P0-2, C-P0-3, D-6, E-1, F-1, F-2.- ADR-0495 — prior MCP probe bug-fix cluster.
- ADR-0511 —
--helpprobe replacing--versiongrep (thetools.mddescription had not been updated). - ADR-0556 — prior MCP audit fixes.
- req: "Fix
isError=Falsespec-correctness bug ... Addprobe_backendtool ... Addvmaf_versiontool ... Addvmaf_score_encodedtool ... in ONE DRAFT PR."