ADR-0128: Embedded MCP server in libvmaf — SSE + UDS + stdio transports, build-flag-gated¶
- Status: Accepted
- Status update 2026-05-15: implemented;
core/include/libvmaf/libvmaf_mcp.handcore/src/mcp/present on master;vmaf_mcp_start()symbol in tree. - Date: 2026-04-20
- Deciders: Lusoris, Claude (Anthropic)
- Tags: mcp, agents, api, build, docs
Context¶
The fork already ships an external Model Context Protocol (MCP) server at mcp-server/vmaf-mcp/. It is a Python process that wraps the VMAF CLI, exposes tools to AI agents (Claude Desktop, Cursor, Zed) over stdio JSON-RPC, and translates structured requests into CLI invocations. The external server is the right design when the user workflow is "run a VMAF comparison and hand the result back to the agent".
A different workflow is emerging: agents that want to steer a running measurement. Examples:
- An agent notices mid-encode that a scene shifted to archival SD footage and asks libvmaf to hot-swap from the 4K VMAF model to an SD-specific tiny-AI model without dropping the FFmpeg pipeline.
- An agent running a BD-Rate sweep wants to query the current frame's per-feature scores while a long-form analysis is in flight.
- An automated CI harness wants to pause / resume a measurement based on external signals (budget, queue pressure) without killing and re-spawning the CLI.
None of these fit the external-server shape — the external server talks to the CLI, and the CLI has already exited by the time the agent wants to influence anything.
The user-facing ask from issue #66 is explicit: a "Tiny MCP inside libvmaf" that exposes agentic-tool JSON-RPC from the C library itself. The engineering question is how to do that without violating the NASA Power of 10 invariants the fork enforces, and without bloating the ABI of a library whose standard consumers are ffmpeg, GStreamer, and the vmaf CLI — none of whom need MCP.
The stock MCP transports are:
- stdio — JSON-RPC over the process's own stdin/stdout; canonical; how Claude Desktop spawns local MCP servers.
- SSE over loopback — JSON-RPC over a Server-Sent-Events HTTP stream on localhost; the "remote" transport MCP added in 2024-11.
- Unix domain socket — not in the MCP spec; a fork-local extension matching how systemd, Docker, and GPU hwcontext ecosystems already work.
Stdio is awkward for a library: libvmaf is loaded into someone else's process (ffmpeg, the CLI, a Python harness). Stdio is already owned by the host; libvmaf can't claim it without colliding with the host's own I/O. Loopback-socket transports (SSE or UDS) fit the library shape naturally — they open a socket the agent can connect to without any stdio contention.
Decision¶
We will ship an embedded MCP server inside libvmaf, gated behind a build flag, with three transport backends selectable at runtime and independently buildable. The design:
- Build flag:
-Denable_mcp_embedded=falsedefault. Standard release builds do NOT link in the MCP surface. Users opt in explicitly; bug risk for the 99% of users who don't run an agentic pipeline is zero. - Per-transport sub-flags:
-Denable_mcp_embedded_sse,-Denable_mcp_embedded_uds,-Denable_mcp_embedded_stdio— each defaultauto(on when the parent flag is on, off otherwise). Lets distro maintainers ship a subset cleanly. - Public API: one new header
core/include/libvmaf/libvmaf_mcp.h. Functionsvmaf_mcp_start(VmafContext *, VmafMcpConfig *, VmafMcpServer **)andvmaf_mcp_stop(VmafMcpServer *). Idempotent; thread-safe; can be called multiple times to run several transports concurrently (e.g., SSE for Claude Desktop + UDS for a cron job). - Threading model: the MCP server runs on a dedicated pthread owned by
VmafMcpServer. The measurement thread is untouched; all MCP work (JSON parsing, socket I/O, tool dispatch) happens on the MCP thread and enqueues work into a lock-free SPSC queue drained at measurement-thread frame boundaries. This keeps the NASA Power of 10 no-dynamic-alloc-after-init invariant on the hot path. - Tool surface (v1): read-only introspection + queued hot-swap commands.
vmaf.status— current model, active features, frames processed, last score.vmaf.features— list of active feature extractors + options.vmaf.score— the most recent per-frame + aggregate score.vmaf.request_model_swap(name)— enqueues a model swap; executes at the next frame boundary if safe, returnsMODEL_SWAP_ENQUEUEDotherwise. Write / mutating tools beyond model-swap stay out of v1 until there's a concrete user need; each new mutating tool gets its own follow-up ADR.- JSON library:
cJSONvendored undersubprojects/cJSON/. Single-file, MIT, no deps, already used by one of our test helpers. Heavier options (jansson, simdjson) give nothing this use case needs. - SSE server:
mongoosevendored as a single-file dependency. MIT-or-commercial dual-license; we use the MIT version. Mongoose is the smallest credible embeddable HTTP server that supports SSE out of the box. - Lifecycle:
vmaf_mcp_startis called by the host (CLI, test harness) aftervmaf_initcompletes and before the firstvmaf_read_pictures.vmaf_mcp_stopis called beforevmaf_close. The server refuses to start after measurement has begun; the CLI exposes--enable-mcp-sse=<port>and--enable-mcp-uds=<path>flags to trigger startup at the right moment.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Three transports, build-flag-gated (chosen) | Library remains clean by default; each deployment chooses the subset that fits; matches canonical MCP transport options | Largest surface area of the alternatives; three transport backends to maintain | User explicitly selected "All three" in the 2026-04-20 question round; maintenance cost is bounded (each transport is ~300 LOC) |
| SSE only | Smallest v1 footprint; natural library shape; matches Claude Desktop remote-MCP support | Leaves stdio-native tooling (Cursor shells, CI scripts piping JSON) without a path | SSE alone doesn't cover headless / pipe-based workflows |
| UDS only (Linux-only) | Simplest; fits systemd world | Windows / macOS users lose the feature; contradicts the cross-platform backend policy | We don't ship Linux-only features in the core library |
| Thread-queue callback to the existing Python MCP server | Avoids C JSON-RPC parsing | Couples the C library to a Python process; violates the fork's "C library stands alone" principle | Architectural regression |
| Keep MCP external-only | Zero new code | Leaves the "steer a running measurement" workflow unreachable; fails issue #66's explicit ask | User has asked for this and the workflow value is real |
Consequences¶
Positive
- Unblocks the agentic-VMAF workflow. Agents can introspect and (in v1, narrowly) steer measurements.
- Build-flag default-off means no impact on the 99% of consumers who don't want MCP — they don't pay for it in code size, startup time, or attack surface.
- Three transports give deployment flexibility: SSE for desktop-agent users, UDS for headless Linux, stdio for pipe-into-a-subprocess cases.
- Clean public header (
libvmaf_mcp.h) keeps the MCP surface out of the corelibvmaf.hABI contract.
Negative
- New optional dependencies (cJSON, mongoose) in
subprojects/. Both single-file, but still something to track. - Thread-safety audit cost: the measurement thread must be proven safe to share per-feature state with the MCP thread via the SPSC queue pattern. Verified by TSan tests in the implementation PR.
- MCP semantics evolve. We may need to re-revisit the transport choice if a future MCP version deprecates SSE or adds a new transport.
- Three-transport CI: each transport needs its own smoke test. Gate only the default-off / off-by-default matrix leg.
Neutral
- No change to the Netflix CPU golden gate or any numerical output — MCP is an I/O surface, not a measurement surface.
- The existing
mcp-server/vmaf-mcp/stays as the recommended default for "I just want to score a video via an agent"; the embedded server is additive.
Operational guardrails¶
- Power of 10 compliance: no dynamic allocation on the measurement thread after
vmaf_mcp_start. All MCP-thread allocs happen before the first frame or in response to agent requests, never in the measurement hot loop. - No heap allocation in the SPSC queue — fixed-size ring buffer sized at
vmaf_mcp_start; overflow returnsVMAF_ERR_MCP_QUEUE_FULLto the agent and the measurement thread continues untouched. - No unbounded loops — every MCP tool has an explicit iteration cap proved at review time (per Power of 10 rule 2).
- Authentication: SSE loopback is bound to
127.0.0.1only; UDS uses filesystem permissions (mode 0700); stdio is trusted by construction. No network-accessible MCP server ships in v1.
References¶
- [req] AskUserQuestion popup answered 2026-04-20: embedded MCP "Yes, build it"; follow-up: "Given that reasoning, final MCP transport call?" → "All three (SSE + UDS + stdio)".
- Research-0005 — design digest: threading model, JSON library selection, SSE library selection, Power of 10 compatibility analysis.
- Anthropic MCP specification
- mongoose HTTP server
- cJSON
- ADR-0122 — precedent for optional-dependency dlopen loader pattern; comparable shape for the MCP sub-flag gating.
mcp-server/vmaf-mcp/— the external Python MCP server this one complements.- CLAUDE.md §12 r10 — per-surface docs rule (new API header gets a doc entry under
docs/api/libvmaf_mcp.mdanddocs/mcp/embedded.md).
Status update 2026-05-08: stays Proposed — runtime in flight (T5-2b)¶
Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).
The strategic decision in this ADR is shipped in stages:
- The public C-API surface (
core/include/libvmaf/libvmaf_mcp.h) is shipped via the audit-first scaffold in ADR-0209 (Accepted). - Build wiring (
enable_mcp/enable_mcp_sse/enable_mcp_uds/enable_mcp_stdio) is shipped. - The stub TU at
core/src/mcp/mcp.creturns-ENOSYSfor every entry point.
What remains unshipped (tracked as backlog item T5-2b):
cJSONandmongoosevendoring (subprojects/).- Dedicated MCP
pthreadand SPSC ring drain at frame boundaries. - SSE / UDS / stdio transport bodies.
- Tool-surface expansion past the contract pinned by
core/test/test_mcp_smoke.c.
Because the stub returns -ENOSYS rather than the runtime behaviour the Decision section promised, this ADR stays Proposed until T5-2b lands. ADR-0209 (Accepted) is the audit-first companion — it explicitly defers the runtime, so it does not supersede this ADR; both stay live. When T5-2b lands, the closing PR flips this ADR to Accepted via a follow-up status-update appendix.
Verification command: