ADR-0931: MCP server — replace subprocess delegation with direct cgo (Phase 1)¶
- Status: Proposed
- Date: 2026-05-31
- Deciders: lusoris
- Tags:
mcp,go,cgo,libvmaf,performance,vmafx,modernization
Context¶
cmd/vmafx-mcp/impl.go is 1306 lines of exec.Command(vmaf-bin, ...) plus parse-stdout. Every MCP tool handler shells out to the vmaf CLI binary, mirroring the Python server's pre-Go behaviour. The pattern was deliberate at the time the Go server was introduced (see ADR-0703, "Direct cgo API calls (bypass vmaf binary)" alternative): it kept the first Go cut tractable while the Python parity matrix was still in flight.
The subprocess model now has three measurable costs:
- Per-call latency floor. Every
vmaf_scoreinvocation forks/execs a ~10 MB binary, parses--help-style flag plumbing insidevmaf.c, opens the model JSON, and re-allocates the entire scorer state. The floor is 80–200 ms even on a noop pair — material for IDE/MCP clients that issue many small scoring calls (per-shot bisects, ladder sweeps). - Error surface. Errors are recovered by parsing
stderrand exit codes. The signal-to-noise is low: a missing model file and a malformed YUV emit nearly-identical strings. Programmatic clients (the gRPC server incmd/vmafx-server/, the MCP IDE clients) get strictly less detail than whatlibvmafalready returns via negative errno. - Operational coupling. The Go binary must locate the
vmafbinary at runtime (pkg/libvmaf/paths.go::FindBinary). Deployments need both the Go binary AND the C binary on disk, plus a search-path heuristic that breaks under distroless / read-only-root k8s pods unlessVMAF_BINis pinned by the operator.
pkg/libvmaf/libvmaf.go already advertises a ScoreDirect extension point (comments only, no implementation). The wrapper links libvmaf.so at build time (cgo) for symbol resolution but only ever invokes it indirectly via exec.Command.
Decision¶
We will implement a direct-cgo scoring path in pkg/libvmaf/ScoreDirect, gated by VMAFX_MCP_DIRECT=1, and migrate the two simplest MCP tools (vmaf_score, describe_model) to use it as Phase 1. The existing subprocess path remains the default and is not removed in this PR.
C-function contract¶
The direct path calls libvmaf in the following order, mirroring core/tools/vmaf.c:
setlocale(LC_NUMERIC, "C")— ADR-0137 (locale-leaked decimal parsing).vmaf_init(&ctx, cfg)—cfg = {LOG_LEVEL_WARNING, n_threads=0, n_subsample=1, cpumask=0, gpumask=0}(auto-thread, all SIMD, CPU only in Phase 1).vmaf_model_load_from_path(&model, &mcfg, model_path)—mcfg = {name, flags=DEFAULT}.vmaf_use_features_from_model(ctx, model).- For each frame index
i: vmaf_picture_alloc(&ref_pic, pix_fmt, bpc, w, h)vmaf_picture_alloc(&dis_pic, pix_fmt, bpc, w, h)- read
w*h*1.5(yuv420p) bytes into each picture'sdata[0..2]planes vmaf_read_pictures(ctx, &ref_pic, &dis_pic, i)— transfers ownership of both pics; libvmaf callsvmaf_picture_unrefinternally.- Flush:
vmaf_read_pictures(ctx, NULL, NULL, 0). vmaf_score_pooled(ctx, model, VMAF_POOL_METHOD_MEAN, &score, 0, frame_count-1).vmaf_model_destroy(model)+vmaf_close(ctx).
Ownership semantics¶
VmafContext *— owned by Go; freed indefer vmaf_close().VmafModel *— owned by Go; freed indefer vmaf_model_destroy().VmafPicture— allocated by Go (stack struct, heap planes viavmaf_picture_alloc); ownership transferred to libvmaf onvmaf_read_pictures. On error before the read call, Go callsvmaf_picture_unrefto free the planes.- All
C.CStringallocations are paired withC.freeviadefer.
Error mapping¶
libvmaf returns negative errno (-EINVAL, -ENOMEM, -ENOENT, ...). The Go wrapper maps these to typed errors:
| C return | Go error |
|---|---|
-EINVAL | ErrInvalidArgument (wraps os.ErrInvalid) |
-ENOMEM | ErrOutOfMemory |
-ENOENT | ErrModelNotFound (wraps os.ErrNotExist) |
-EIO | ErrPictureRead |
| other < 0 | fmt.Errorf("libvmaf %s returned %d", call, rc) |
The mapping lives in pkg/libvmaf/errors.go (new file). Callers can branch via errors.Is(err, libvmaf.ErrModelNotFound).
Fallback flag¶
VMAFX_MCP_DIRECT=1 opts into the direct path per-process. Unset / any other value falls back to the subprocess path. Logs at INFO level which path was taken on first call so operators can confirm. This is Phase 1 only — Phase 3 flips the default once parity is broad and CI gates pass.
Tools migrated in this PR (Phase 1)¶
vmaf_score(simplest: ref + dis + geometry → mean VMAF). Direct path reads raw YUV, calls libvmaf, returns the same JSON shape as the subprocess path (pooled_metrics.vmaf.mean+backend_used+backend_requested).describe_model(no scoring; metadata only). Direct path callsvmaf_model_load_from_path+ parses the file's own JSON (existing code) to emitname,path,format,size_bytes,model_type,feature_names. The cgo call here is essentially a syntactic validation — it confirms the model parses through libvmaf, which catches schema drift the file-only parser silently ignores.
Out of scope (deferred to Phase 2+)¶
- GPU backends (
cuda,sycl,vulkan,hip,metal) — Phase 1 is CPU only. Backend selection viacpumask/gpumasklands in Phase 2. - Encoded-input tools (
vmaf_score_encoded) — Phase 2 adds an ffmpeg-decode pipe into the direct scorer. - Bootstrap / model-collection scoring — Phase 3.
- Removing the subprocess path — Phase 3 closeout PR.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Keep subprocess delegation indefinitely | Zero migration risk; mirrors Python parity matrix | 80–200 ms per-call floor; poor error surface; deploy-time binary-search heuristic | Phase-4 modernization explicitly targets in-process scoring (memory project_vmafx_phase4_language_modernization) |
| Migrate all 16 tools in one PR | One-shot transition | ~1300 LOC handler rewrite + 16-handler test matrix in one PR violates ADR PR-size guidance; high revert cost on regression | Phase 1 ships 2 simplest tools behind a flag; Phase 2/3 extend |
Rewrite the C vmaf.c pipeline in Go without cgo | Pure Go binary; no libvmaf.so dependency at runtime | Reimplements the entire feature-extractor pipeline + SIMD dispatch + GPU runtime — multi-month rewrite, infeasible | The C library IS the source of truth; cgo is the right surface |
| Use Go-only ffmpeg bindings to score | Removes one subprocess (ffmpeg) too | Doesn't solve the libvmaf subprocess; pulls in goav/cgo for ffmpeg, doubling the FFI surface | Orthogonal — Phase 2 ffmpeg integration is a separate decision |
Default-on VMAFX_MCP_DIRECT in Phase 1 | Single code path | New code path with no production miles; risk of regression on rare YUV layouts (yuv422p, 10/12-bit) | Phase 1 keeps it opt-in until CI parity sweep is broad |
Consequences¶
- Positive: Per-call latency drops from 80–200 ms to <5 ms on the direct path (no fork/exec, no
vmafargv parsing, no temp-file JSON round-trip). Typed errors enable programmatic recovery (errors.Is(err, ErrModelNotFound)). Distroless deployments can drop/usr/local/bin/vmaffrom the image once Phase 3 completes. - Negative: cgo build is required for the MCP server (already true via
pkg/libvmaf's existing cgo import — no new constraint). Reference-counted picture ownership is a foot-gun: a panicked read betweenvmaf_picture_allocandvmaf_read_picturesleaks heap. The Go wrapper usesdefer vmaf_picture_unrefuntil the read transfers ownership, with explicit nil-out to avoid double-free. - Neutral / follow-ups: Phase 2 migrates
vmaf_score_encoded,probe_backend,run_benchmark. Phase 3 default-flipsVMAFX_MCP_DIRECT=1after a full parity sweep against Netflix golden scores. Phase 3 closeout PR removes the subprocess path and theFindBinaryheuristic. Thedocs/architecture/mcp-cgo-direct-migration.mddocument tracks the rollout per-tool.
References¶
- ADR-0703 —
vmafx-serverGo gRPC + HTTP service. The "Direct cgo API calls (bypass vmaf binary)" row in that ADR's Alternatives table is this ADR's parent decision. - ADR-0137 —
setlocale(LC_NUMERIC, "C")is mandatory before anystrtod-touching libvmaf call. - ADR-0700 —
core/path layout, headers atcore/include/libvmaf/. - ADR-0686 — VMAFX modernization umbrella; this is item 1 ("MCP cgo direct") under that rebrand.
req— "Implement modernization #1: replace MCP subprocess delegation with direct cgo viapkg/libvmaf. Phase 1: ADR + extension-point design + first 1-2 tool handlers as proof-of-concept, behindVMAFX_MCP_DIRECT=1flag." (per user task specification, 2026-05-31).