dev-MCP Docker Container¶
The dev-MCP container runs the full VMAF fork inside Docker with all four GPU backends enabled (CUDA, SYCL, Vulkan, HIP) plus the embedded MCP stdio server. It is the standard environment for:
- Live probing of VMAF scores across all backends from a single shell.
- Running the continuous smoke-probe cron (
smoke-probe-cronservice). - Reproducing build regressions on GPU paths other than the host's primary GPU (for example: catching HIP toolchain regressions on an NVIDIA-only host).
The design decision is recorded in ADR-0435.
Prerequisites¶
Required¶
| Component | Version | Notes |
|---|---|---|
| Docker Engine | 26+ | docker compose v2 plugin required |
| NVIDIA Container Toolkit | latest | Enables --gpus all / runtime: nvidia for CUDA kernel execution. The container builds and runs without it; CUDA feature extractors return -ENOSYS at runtime. |
Optional¶
| Component | Purpose |
|---|---|
| AMD ROCm runtime on host | Run HIP kernels inside the container. Without it, HIP compiles but returns an error at kernel dispatch. |
| Intel oneAPI runtime on host | Run SYCL kernels via Level Zero. Without it, SYCL falls back to the OpenCL CPU device or returns an error. |
jq | Pretty-print probe JSON output on the host. apt install jq. |
How to build¶
Use the provided wrapper from the repository root:
Or, to build without starting:
Important — always pass
--project-directory. Without it, Docker Compose v2 sets the project directory to the compose-file's parent (dev/), causingcontext: .to resolve todev/instead of the repo root. This bypasses the root.dockerignoreand — on developer machines that hold.corpus/(up to 781 GB) — sends the entire corpus into the build context, accumulating copies in/var/lib/docker/overlay2/on every failed build. Thedev-mcp-up.shwrapper always passes--project-directory; the baredocker compose -fform is unsafe unless run from the repo root with the flag explicit.
The first build downloads all GPU SDK layers and compiles libvmaf from source. Expect 20–40 minutes on a typical workstation; subsequent builds use the layer cache and take 1–3 minutes when only Python packages change.
How to start¶
# CPU + Vulkan/lavapipe only (no GPU passthrough)
./dev/scripts/dev-mcp-up.sh
# With NVIDIA GPU passthrough
NVIDIA_VISIBLE_DEVICES=all CONTAINER_RUNTIME=nvidia \
./dev/scripts/dev-mcp-up.sh
To ensure the container joins the correct host video and render groups (GIDs differ across distributions), export HOST_GID_VIDEO and HOST_GID_RENDER before starting. The wrapper reads them automatically; you can also pass them inline:
HOST_GID_VIDEO=$(getent group video | cut -d: -f3) \
HOST_GID_RENDER=$(getent group render | cut -d: -f3) \
CONTAINER_RUNTIME=nvidia \
docker compose -f dev/docker-compose.yml --project-directory . up -d dev-mcp
The defaults baked into docker-compose.yml (44 for video, 109 for render) match common Ubuntu installations. Override whenever getent group video returns a different GID (for example, Arch Linux uses 985/986).
The dev-mcp-up.sh wrapper builds (if needed) then starts:
vmaf-dev-mcp— primary container; runsvmaf-mcpviadocker exec -istdio when requested. The service healthcheck isvmaf --version, not a socket check.vmaf-smoke-probe-cron— waits for the primary to be healthy, then probes every 15 minutes.
Both services write probe files to .workingdir/dev-mcp-probes/ on the host.
How to attach¶
# Interactive bash shell inside the running dev-mcp container
./dev/scripts/dev-mcp-shell.sh
# Run a specific command
./dev/scripts/dev-mcp-shell.sh vmaf-dev-mcp vmaf --version
./dev/scripts/dev-mcp-shell.sh vmaf-dev-mcp vmaf --list-features
Inside the container the full environment is initialised:
vmafCLI —/usr/local/bin/vmafvmaf-mcp-server—/opt/vmaf-venv/bin/vmaf-mcp-server- GPU SDKs —
nvcc,icpx,hipccinPATH - testdata —
/workspace/testdata/(read-only bind mount from host repo) - models —
/workspace/model/(read-only)
How to manually probe¶
Run a single smoke probe outside the cron cycle:
This executes smoke-probe-loop.sh --once inside the running container and writes probe-<timestamp>.json to .workingdir/dev-mcp-probes/. If jq is installed on the host the result is pretty-printed to stdout.
How to stop¶
# Stop, keep volumes (probe history preserved)
./dev/scripts/dev-mcp-down.sh
# Stop and remove volumes (clears socket volume; probe bind-mount preserved)
./dev/scripts/dev-mcp-down.sh --volumes
How to interpret probe outputs¶
Each probe file follows this schema:
{
"ts": "2026-05-15T14:30:00Z",
"host_id": "myhostname:abc123def456",
"backend_results": {
"cpu": { "score": 76.45, "duration_ms": 3200, "error": null },
"cuda": { "score": 76.44, "duration_ms": 820, "error": null },
"sycl": { "score": null, "duration_ms": 0, "error": "ENOSYS: no SYCL device" },
"vulkan": { "score": 76.45, "duration_ms": 1100, "error": null }
},
"mcp_results": {
"list_features": { "feature_count": 14, "duration_ms": 45, "error": null },
"compute_vmaf": { "score": 76.45, "duration_ms": 3250, "error": null }
}
}
| Field | Meaning |
|---|---|
score | Aggregate VMAF score for the 48-frame 576×324 golden pair. null = backend failed. |
duration_ms | Wall-clock time for the full scoring run. |
error | Error message string, or null for success. |
feature_count | Number of features returned by the MCP list_features tool. |
Expected values¶
- CPU score: ~76.45 (matches the Netflix golden pair; exact value varies by model version).
- CUDA / Vulkan scores: within ±0.01 of CPU (numeric parity is not bit-exact — see ADR-0214).
- SYCL:
ENOSYSon hosts without Intel GPU or oneAPI runtime; normal. - HIP: error on NVIDIA-only hosts; normal.
Common error patterns¶
| Error | Cause | Action |
|---|---|---|
ENOSYS: no CUDA device | No NVIDIA GPU or Container Toolkit not installed | Install Container Toolkit and set NVIDIA_VISIBLE_DEVICES=all |
ENOSYS: no SYCL device | No Intel GPU / oneAPI runtime | Expected on non-Intel hosts; not a regression |
mcp stdio returned empty response | vmaf-mcp-server not in PATH or build failed | Rebuild container; check docker compose logs dev-mcp |
| Score drift >0.1 from baseline | Code regression or model change | Run /validate-scores skill; check recent commits |
Known limitations¶
| Limitation | Details |
|---|---|
| HIP kernels cannot run on NVIDIA-only hosts | The HIP toolchain in the container compiles and embeds HSACO fat binaries, but the AMD ROCm runtime is not available. Feature extractors return an error at kernel dispatch. The container is still valuable for catching compile-time regressions in HIP paths. |
| Metal is disabled on Linux | libvmaf is built with -Denable_metal=auto, which resolves to disabled on Linux. Metal kernels require macOS + Apple Silicon. |
| SYCL requires Intel GPU or software emulation | Without the oneAPI Level Zero runtime, SYCL falls back to the OpenCL CPU device (if available) or returns -ENOSYS. Performance is significantly lower than on a dedicated Intel GPU. |
| Vulkan lavapipe is CPU-backed | The lavapipe software Vulkan ICD shipped by mesa is enumerated last when no real ICD is available; it allows Vulkan correctness testing without a physical GPU, but throughput is 3–5× slower than real hardware. |
| First build takes 20–40 minutes | All four GPU SDK layers are fetched during docker compose build. Subsequent builds are fast (layer cache). |
vmaf-tune report requires matplotlib | Baked into /opt/vmaf-venv by dev/Containerfile (added 2026-05-18 per ADR-0498). When the container is rebuilt, vmaf-tune report --format both produces a self-contained HTML+Markdown report with inline charts. If you see ModuleNotFoundError: No module named 'matplotlib' inside the container, your image predates the ADR-0498 commit — docker compose -f dev/docker-compose.yml build dev-mcp rebuilds it. |
Backend matrix (post-ADR-0514)¶
On a host with NVIDIA + Intel Arc + AMD silicon and the NVIDIA Container Toolkit installed, every libvmaf backend should run inside the container:
| Backend | Expected | Required host state |
|---|---|---|
cpu | VMAF score, rc=0 | always |
cuda | VMAF score, rc=0 (5-place-equal to CPU per ADR-0214) | NVIDIA GPU + Container Toolkit |
sycl | VMAF score, rc=0 (5-place-equal to CPU) | Intel GPU exposed via /dev/dri bind-mount (ADR-0528) |
vulkan | VMAF score, rc=0 (per-adapter device picker selects the first compatible ICD) | At least one Vulkan ICD: NVIDIA (via NVIDIA_DRIVER_CAPABILITIES=graphics), Intel/AMD (via mesa-vulkan-drivers), or lavapipe software fallback |
hip | VMAF score, rc=0 (5-place-equal to CPU) | AMD GPU via /dev/kfd + /dev/dri/renderD* |
metal | "built without metal support" on Linux containers | macOS host only |
Reproducer:
docker exec vmaf-dev-mcp bash -c '
for B in cpu cuda sycl vulkan hip; do
vmaf --reference /workspace/python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted /workspace/python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--backend $B --json --output /tmp/probe_$B.json
echo "rc=$? backend=$B"
done
'
Environment-variable contract¶
The container pins one env-var family (HSA / ROCm) at compose-up time, rewrites one (VK_DRIVER_FILES) at entrypoint time based on what is visible on disk, and intentionally leaves everything else alone. Pinning the wrong subset silently hid one or more GPU backends in earlier image versions:
| Env var | Contract | Why not pinned |
|---|---|---|
VK_ICD_FILENAMES / VK_DRIVER_FILES | unset by default; Vulkan loader uses /etc/vulkan/icd.d/ + /usr/share/vulkan/icd.d/ search path | An earlier pin to lvp_icd.x86_64.json (typo of lvp_icd.json) hid every real GPU. ADR-0509 / Research-0138. |
LD_LIBRARY_PATH | includes ${ONEAPI_ROOT}/{compiler,umf,tcm,tbb}/latest/lib | tcm/latest/lib carries libhwloc.so.15 (level-zero UR adapter dlopens it at load time; dropping it causes SYCL "Platforms: 0" on Intel Arc). tbb/latest/lib carries libtbb.so.12 (the Intel CPU OpenCL ICD dlopens it at platform enumeration; dropping it silently removes the Intel CPU OpenCL platform — ADR-0543). |
NVIDIA_DRIVER_CAPABILITIES | compute,graphics,utility,video (set in dev/docker-compose.yml common-env) | graphics is what makes the NVIDIA Container Toolkit bind-mount nvidia_icd.json into /etc/vulkan/icd.d/. Dropping graphics hides NVIDIA from Vulkan. |
| Env var | Contract | Rationale |
| --- | --- | --- |
VK_DRIVER_FILES | Rewritten by dev/scripts/dev-mcp-entrypoint.sh at container start to the colon-separated list of every non-lavapipe ICD JSON visible under /etc/vulkan/icd.d/ + /usr/share/vulkan/icd.d/. Unset when no real ICD is present (CPU-only fallback). | An earlier image left both env vars unset and relied on alphabetical search order; on multi-vendor hosts where mesa's lvp_icd.json sorted before NVIDIA's nvidia_icd.json (or Intel/AMD mesa ICDs), vmaf --vulkan_device 0 silently landed on lavapipe. ADR-0542 closes the race by filtering lavapipe out whenever a real ICD exists. |
VK_ICD_FILENAMES | Unset (deprecated by Khronos in favour of VK_DRIVER_FILES). | Setting it overrides the loader's allowlist semantics; the prior lvp_icd.x86_64.json typo (ADR-0509 / Research-0138) hid every real GPU. |
LD_LIBRARY_PATH | Includes ${ONEAPI_ROOT}/{compiler,umf,tcm}/latest/lib. | tcm/latest/lib carries libhwloc.so.15 — the level-zero UR adapter dlopens it at load time. Dropping it causes SYCL "Platforms: 0" on Intel Arc. |
NVIDIA_DRIVER_CAPABILITIES | compute,graphics,utility,video (set in dev/docker-compose.yml common-env). | graphics is what makes the NVIDIA Container Toolkit bind-mount nvidia_icd.json into /etc/vulkan/icd.d/. Dropping graphics hides NVIDIA from Vulkan while leaving CUDA + nvidia-smi working — a hard regression to spot. |
HSA_OVERRIDE_GFX_VERSION | Pinned to 10.3.0 in common-env. | AMD gfx1036 (Raphael iGPU, RDNA2 IP rev 10.3.6) is not on the ROCm 6.x supported-GPU allowlist. Without the override, hsa_init() returns HSA_STATUS_ERROR_OUT_OF_RESOURCES and rocminfo reports "Unable to open /dev/kfd read-write: Invalid argument" even though /dev/kfd is bind-mounted. gfx1036 is binary-compatible enough with gfx1030 for the libvmaf HIP feature kernels (ADR-0530 / ADR-0538). ADR-0542. |
HSA_ENABLE_SDMA | Pinned to 0 in common-env. | On RDNA2 iGPUs sharing system RAM with the CPU, the SDMA copy engine triggers VM faults on small device→host transfers (libvmaf collect path is dominated by such transfers). ADR-0543. |
ROCR_VISIBLE_DEVICES | Pinned to 0 in common-env. | Pins HIP to the single AMD adapter on multi-iGPU + dGPU hosts so kernels cannot accidentally dispatch onto a non-RDNA2 device that needs a different HSA_OVERRIDE_GFX_VERSION. ADR-0542. |
Operators that need to force a single Vulkan ICD per invocation can still docker exec vmaf-dev-mcp env VK_DRIVER_FILES=/path/to/icd.json vmaf … — the per-exec env var overrides the entrypoint-time pin. Operators on hosts with a ROCm-supported GPU on the allowlist (gfx1030 / gfx1100 / gfx1101 desktop / workstation parts) can override HSA_OVERRIDE_GFX_VERSION to the empty string at docker compose up time to remove the lie.
FFmpeg encoder matrix (post-ADR-0543)¶
The in-image FFmpeg is built with the fork's full encoder set so that vmaf-tune compare sweeps can address every codec the project supports without skipping rows with hardware encoder not available: ... not compiled into ffmpeg. The matrix:
| Encoder | Compile-in source | Host runtime requirement |
|---|---|---|
libx264 | libx264-dev (apt) | none |
libx265 | libx265-dev (apt) | none |
libvpx-vp9 | libvpx-dev (apt) | none |
libsvtav1 | source build (SVT-AV1 pinned in dev/Containerfile) | none |
libaom-av1 | adapter exists, but the in-image FFmpeg intentionally omits libaom until patch 0007's ROI bridge targets released libaom fields | external FFmpeg with --enable-libaom, or wait for the patch-stack follow-up |
libvvenc | source build (Fraunhofer VVenC v1.14.0) | none |
h264_nvenc / hevc_nvenc / av1_nvenc | --enable-nvenc + nv-codec-headers | NVIDIA GPU + Container Toolkit; NVENC capability bit on host driver (av1_nvenc requires Ada or newer — RTX 4090 ok) |
h264_qsv / hevc_qsv / av1_qsv | --enable-libvpl + libvpl-dev dispatcher + pinned intel/vpl-gpu-rt (libmfx-gen.so) source build installed under /usr/lib/x86_64-linux-gnu/ | Intel GPU + /dev/dri/renderD* passthrough; the container auto-selects the Intel render node for QSV |
h264_amf / hevc_amf / av1_amf | --enable-amf + AMF headers (source) | AMD GPU + libamfrt64.so from the proprietary amdgpu-pro userspace bind-mounted into the container. The open-source ROCm install in the image (rocm-hip-runtime-dev) does not include AMF. |
To verify the in-image listing after a rebuild:
docker exec vmaf-dev-mcp ffmpeg -hide_banner -encoders 2>&1 \
| grep -E "libsvtav1|libvvenc|libvpx-vp9|nvenc|qsv|amf|vpl" \
| head -20
Expected (assuming the build-time encoder probe in stage 3.5 logged no WARN ... missing):
V....D libsvtav1 SVT-AV1(Scalable Video Technology for AV1) encoder
V..... libvvenc libvvenc-based VVC encoder
V....D libvpx-vp9 libvpx VP9
V....D h264_nvenc NVIDIA NVENC H.264 encoder
V....D hevc_nvenc NVIDIA NVENC hevc encoder
V....D av1_nvenc NVIDIA NVENC av1 encoder
V....D h264_qsv H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (Intel Quick Sync Video acceleration)
V....D hevc_qsv HEVC (Intel Quick Sync Video acceleration)
V....D av1_qsv AV1 (Intel Quick Sync Video acceleration)
V....D h264_amf AMD AMF H.264 Encoder
V....D hevc_amf AMD AMF HEVC encoder
V....D av1_amf AMD AMF AV1 encoder
Hardware-encoder runtime failure modes¶
The encoders above are split into "compile-in" (does the binary advertise the encoder?) and "runtime-ok" (does a 1-frame dummy encode succeed?). vmaf-tune compare's compare.py::probe_encoder_available runs both stages. The container locks down the compile-in promise; runtime failures produce stable row-level skip strings:
| Symptom | Cause | Action |
|---|---|---|
hardware encoder not available: h264_nvenc dummy encode failed (...): Cannot load libcuda.so.1 | Container started without runtime: nvidia | CONTAINER_RUNTIME=nvidia ./dev/scripts/dev-mcp-up.sh |
hardware encoder not available: av1_nvenc dummy encode failed: Cannot load library | Host NVIDIA driver too old for AV1 NVENC (Turing/Ampere don't have av1_nvenc) | Use h264_nvenc / hevc_nvenc on that host; av1_nvenc needs Ada or newer |
hardware encoder not available: h264_qsv dummy encode failed: Error creating a MFX session | Stale image missing libmfx-gen.so in the dispatcher search path, or Intel iGPU not exposed (/dev/dri/renderD* missing) | Rebuild dev-mcp so the pinned intel/vpl-gpu-rt layer is present under /usr/lib/x86_64-linux-gnu/; verify vainfo --display drm --device /dev/dri/renderD<N> on the Intel node |
hardware encoder not available: h264_amf dummy encode failed: ... cannot open shared object libamfrt64.so | amdgpu-pro userspace not bind-mounted | Install amdgpu-pro on the host and bind-mount /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamfrt64.so into the container, or accept that AMF encode is unavailable on this host |
Reproducer — full cross-codec compare sweep¶
docker exec vmaf-dev-mcp bash -c '
cd /workspace && PYTHONPATH=/workspace/tools/vmaf-tune/src:$PYTHONPATH \
python -c "from vmaftune.cli import main; raise SystemExit(main())" compare \
--src /workspace/.corpus/bbb_e2e/bbb_sunflower_1080p_60fps_normal.mp4 \
--width 1920 --height 1080 --framerate 60 \
--target-vmafs 85,90,92,95 \
--encoders libx264,libx265,libsvtav1,libaom-av1,libvvenc,libvpx-vp9,h264_nvenc,hevc_nvenc,av1_nvenc,h264_qsv,hevc_qsv,av1_qsv,h264_amf,hevc_amf,av1_amf \
--duration 5 --sample-clip-seconds 3 --max-iterations 3 \
--score-backend cuda --format json --output /tmp/v11_1080p_cmp_full.json'
Encoders that are not runtime-available on the host produce per-row ok=false entries with the diagnostic strings above; the sweep does not abort.
Host-kernel ↔ container-userspace UAPI version pins (ADR-0543)¶
Intel NEO compute-runtime and ROCm KFD userspace are version-pinned via Containerfile ARGs to match the host kernel's i915 / xe / KFD ioctl ABI. A mismatch silently degrades vmaf --backend sycl|hip to CPU.
| Pin | Current value | Why pinned |
|---|---|---|
ARG NEO_VER | 26.18.38308.1 | Intel's noble/unified APT repo's newest as of 2026-05-18 is 25.18.x, too old for kernel ≥ 7.0. NEO 25.18 returns ZE_RESULT_ERROR_UNINITIALIZED from zeInit() against kernel-7.x i915/xe. Pulled from github.com/intel/compute-runtime/releases. |
ARG IGC_VER + ARG GMMLIB_VER | 2.34.4+21428 + 22.10.0 | NEO 26.18's release notes mandate IGC v2.34.4 + gmmlib 22.10.0. Pinned together. |
ARG ROCM_VER | 7.2.3 | Matches Arch host hsa-rocr 7.2.3. ROCm 6.x KFD userspace returns Unable to open /dev/kfd read-write: Invalid argument against kernel-7.x KFD ioctls. |
dev-mcp-entrypoint.sh emits a runtime visibility probe on container start (ADR-0543): WARN: SYCL level_zero:gpu NOT detected or WARN: HIP HSA agent NOT detected means the host kernel has revved past the pinned userspace ABI — bump the ARG and rebuild rather than working around the fallback (CLAUDE.md §12 r15 sub-rule 4). The latest NEO release tag is at https://github.com/intel/compute-runtime/releases/latest; the latest ROCm noble channel is listed under https://repo.radeon.com/rocm/apt/.