dev-MCP Docker Container¶

The dev-MCP container runs the full VMAF fork inside Docker with all four GPU backends enabled (CUDA, SYCL, Vulkan, HIP) plus the embedded MCP stdio server. It is the standard environment for:

Live probing of VMAF scores across all backends from a single shell.
Running the continuous smoke-probe cron (smoke-probe-cron service).
Reproducing build regressions on GPU paths other than the host's primary GPU (for example: catching HIP toolchain regressions on an NVIDIA-only host).

The design decision is recorded in ADR-0435.

Prerequisites¶

Required¶

Component	Version	Notes
Docker Engine	26+	`docker compose` v2 plugin required
NVIDIA Container Toolkit	latest	Enables `--gpus all` / `runtime: nvidia` for CUDA kernel execution. The container builds and runs without it; CUDA feature extractors return `-ENOSYS` at runtime.

Optional¶

Component	Purpose
AMD ROCm runtime on host	Run HIP kernels inside the container. Without it, HIP compiles but returns an error at kernel dispatch.
Intel oneAPI runtime on host	Run SYCL kernels via Level Zero. Without it, SYCL falls back to the OpenCL CPU device or returns an error.
`jq`	Pretty-print probe JSON output on the host. `apt install jq`.

How to build¶

Use the provided wrapper from the repository root:

./dev/scripts/dev-mcp-up.sh

Or, to build without starting:

docker compose --project-directory "$(pwd)" -f dev/docker-compose.yml build

Important — always pass --project-directory. Without it, Docker Compose v2 sets the project directory to the compose-file's parent (dev/), causing context: . to resolve to dev/ instead of the repo root. This bypasses the root .dockerignore and — on developer machines that hold .corpus/ (up to 781 GB) — sends the entire corpus into the build context, accumulating copies in /var/lib/docker/overlay2/ on every failed build. The dev-mcp-up.sh wrapper always passes --project-directory; the bare docker compose -f form is unsafe unless run from the repo root with the flag explicit.

The first build downloads all GPU SDK layers and compiles libvmaf from source. Expect 20–40 minutes on a typical workstation; subsequent builds use the layer cache and take 1–3 minutes when only Python packages change.

How to start¶

# CPU + Vulkan/lavapipe only (no GPU passthrough)
./dev/scripts/dev-mcp-up.sh

# With NVIDIA GPU passthrough
NVIDIA_VISIBLE_DEVICES=all CONTAINER_RUNTIME=nvidia \
    ./dev/scripts/dev-mcp-up.sh

To ensure the container joins the correct host video and render groups (GIDs differ across distributions), export HOST_GID_VIDEO and HOST_GID_RENDER before starting. The wrapper reads them automatically; you can also pass them inline:

HOST_GID_VIDEO=$(getent group video | cut -d: -f3) \
HOST_GID_RENDER=$(getent group render | cut -d: -f3) \
CONTAINER_RUNTIME=nvidia \
docker compose -f dev/docker-compose.yml --project-directory . up -d dev-mcp

The defaults baked into docker-compose.yml (44 for video, 109 for render) match common Ubuntu installations. Override whenever getent group video returns a different GID (for example, Arch Linux uses 985/986).

The dev-mcp-up.sh wrapper builds (if needed) then starts:

vmaf-dev-mcp — primary container; runs vmaf-mcp via docker exec -i stdio when requested. The service healthcheck is vmaf --version, not a socket check.
vmaf-smoke-probe-cron — waits for the primary to be healthy, then probes every 15 minutes.

Both services write probe files to .workingdir/dev-mcp-probes/ on the host.

How to attach¶

# Interactive bash shell inside the running dev-mcp container
./dev/scripts/dev-mcp-shell.sh

# Run a specific command
./dev/scripts/dev-mcp-shell.sh vmaf-dev-mcp vmaf --version
./dev/scripts/dev-mcp-shell.sh vmaf-dev-mcp vmaf --list-features

Inside the container the full environment is initialised:

vmaf CLI — /usr/local/bin/vmaf
vmaf-mcp-server — /opt/vmaf-venv/bin/vmaf-mcp-server
GPU SDKs — nvcc, icpx, hipcc in PATH
testdata — /workspace/testdata/ (read-only bind mount from host repo)
models — /workspace/model/ (read-only)

How to manually probe¶

Run a single smoke probe outside the cron cycle:

./dev/scripts/dev-mcp-probe.sh

This executes smoke-probe-loop.sh --once inside the running container and writes probe-<timestamp>.json to .workingdir/dev-mcp-probes/. If jq is installed on the host the result is pretty-printed to stdout.

How to stop¶

# Stop, keep volumes (probe history preserved)
./dev/scripts/dev-mcp-down.sh

# Stop and remove volumes (clears socket volume; probe bind-mount preserved)
./dev/scripts/dev-mcp-down.sh --volumes

How to interpret probe outputs¶

Each probe file follows this schema:

{
  "ts": "2026-05-15T14:30:00Z",
  "host_id": "myhostname:abc123def456",
  "backend_results": {
    "cpu":    { "score": 76.45, "duration_ms": 3200, "error": null },
    "cuda":   { "score": 76.44, "duration_ms":  820, "error": null },
    "sycl":   { "score": null,  "duration_ms":    0, "error": "ENOSYS: no SYCL device" },
    "vulkan": { "score": 76.45, "duration_ms": 1100, "error": null }
  },
  "mcp_results": {
    "list_features": { "feature_count": 14, "duration_ms": 45, "error": null },
    "compute_vmaf":  { "score": 76.45, "duration_ms": 3250, "error": null }
  }
}

Field	Meaning
`score`	Aggregate VMAF score for the 48-frame 576×324 golden pair. `null` = backend failed.
`duration_ms`	Wall-clock time for the full scoring run.
`error`	Error message string, or `null` for success.
`feature_count`	Number of features returned by the MCP `list_features` tool.

Expected values¶

CPU score: ~76.45 (matches the Netflix golden pair; exact value varies by model version).
CUDA / Vulkan scores: within ±0.01 of CPU (numeric parity is not bit-exact — see ADR-0214).
SYCL: ENOSYS on hosts without Intel GPU or oneAPI runtime; normal.
HIP: error on NVIDIA-only hosts; normal.

Common error patterns¶

Error	Cause	Action
`ENOSYS: no CUDA device`	No NVIDIA GPU or Container Toolkit not installed	Install Container Toolkit and set `NVIDIA_VISIBLE_DEVICES=all`
`ENOSYS: no SYCL device`	No Intel GPU / oneAPI runtime	Expected on non-Intel hosts; not a regression
`mcp stdio returned empty response`	`vmaf-mcp-server` not in PATH or build failed	Rebuild container; check `docker compose logs dev-mcp`
Score drift >0.1 from baseline	Code regression or model change	Run `/validate-scores` skill; check recent commits

Known limitations¶

Limitation	Details
HIP kernels cannot run on NVIDIA-only hosts	The HIP toolchain in the container compiles and embeds HSACO fat binaries, but the AMD ROCm runtime is not available. Feature extractors return an error at kernel dispatch. The container is still valuable for catching compile-time regressions in HIP paths.
Metal is disabled on Linux	`libvmaf` is built with `-Denable_metal=auto`, which resolves to disabled on Linux. Metal kernels require macOS + Apple Silicon.
SYCL requires Intel GPU or software emulation	Without the oneAPI Level Zero runtime, SYCL falls back to the OpenCL CPU device (if available) or returns `-ENOSYS`. Performance is significantly lower than on a dedicated Intel GPU.
Vulkan lavapipe is CPU-backed	The lavapipe software Vulkan ICD shipped by mesa is enumerated last when no real ICD is available; it allows Vulkan correctness testing without a physical GPU, but throughput is 3–5× slower than real hardware.
First build takes 20–40 minutes	All four GPU SDK layers are fetched during `docker compose build`. Subsequent builds are fast (layer cache).
`vmaf-tune report` requires matplotlib	Baked into `/opt/vmaf-venv` by `dev/Containerfile` (added 2026-05-18 per ADR-0498). When the container is rebuilt, `vmaf-tune report --format both` produces a self-contained HTML+Markdown report with inline charts. If you see `ModuleNotFoundError: No module named 'matplotlib'` inside the container, your image predates the ADR-0498 commit — `docker compose -f dev/docker-compose.yml build dev-mcp` rebuilds it.

Backend matrix (post-ADR-0514)¶

On a host with NVIDIA + Intel Arc + AMD silicon and the NVIDIA Container Toolkit installed, every libvmaf backend should run inside the container:

Backend	Expected	Required host state
`cpu`	VMAF score, rc=0	always
`cuda`	VMAF score, rc=0 (5-place-equal to CPU per ADR-0214)	NVIDIA GPU + Container Toolkit
`sycl`	VMAF score, rc=0 (5-place-equal to CPU)	Intel GPU exposed via `/dev/dri` bind-mount (ADR-0528)
`vulkan`	VMAF score, rc=0 (per-adapter device picker selects the first compatible ICD)	At least one Vulkan ICD: NVIDIA (via NVIDIA_DRIVER_CAPABILITIES=graphics), Intel/AMD (via mesa-vulkan-drivers), or lavapipe software fallback
`hip`	VMAF score, rc=0 (5-place-equal to CPU)	AMD GPU via `/dev/kfd` + `/dev/dri/renderD*`
`metal`	"built without metal support" on Linux containers	macOS host only

Reproducer:

docker exec vmaf-dev-mcp bash -c '
  for B in cpu cuda sycl vulkan hip; do
    vmaf --reference /workspace/python/test/resource/yuv/src01_hrc00_576x324.yuv \
         --distorted /workspace/python/test/resource/yuv/src01_hrc01_576x324.yuv \
         --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
         --backend $B --json --output /tmp/probe_$B.json
    echo "rc=$? backend=$B"
  done
'

Environment-variable contract¶

The container pins one env-var family (HSA / ROCm) at compose-up time, rewrites one (VK_DRIVER_FILES) at entrypoint time based on what is visible on disk, and intentionally leaves everything else alone. Pinning the wrong subset silently hid one or more GPU backends in earlier image versions:

Env var	Contract	Why not pinned
`VK_ICD_FILENAMES` / `VK_DRIVER_FILES`	unset by default; Vulkan loader uses `/etc/vulkan/icd.d/` + `/usr/share/vulkan/icd.d/` search path	An earlier pin to `lvp_icd.x86_64.json` (typo of `lvp_icd.json`) hid every real GPU. ADR-0509 / Research-0138.
`LD_LIBRARY_PATH`	includes `${ONEAPI_ROOT}/{compiler,umf,tcm,tbb}/latest/lib`	`tcm/latest/lib` carries `libhwloc.so.15` (level-zero UR adapter dlopens it at load time; dropping it causes SYCL "Platforms: 0" on Intel Arc). `tbb/latest/lib` carries `libtbb.so.12` (the Intel CPU OpenCL ICD dlopens it at platform enumeration; dropping it silently removes the Intel CPU OpenCL platform — ADR-0543).
`NVIDIA_DRIVER_CAPABILITIES`	`compute,graphics,utility,video` (set in `dev/docker-compose.yml` common-env)	`graphics` is what makes the NVIDIA Container Toolkit bind-mount `nvidia_icd.json` into `/etc/vulkan/icd.d/`. Dropping `graphics` hides NVIDIA from Vulkan.
Env var	Contract	Rationale
---	---	---
`VK_DRIVER_FILES`	Rewritten by `dev/scripts/dev-mcp-entrypoint.sh` at container start to the colon-separated list of every non-lavapipe ICD JSON visible under `/etc/vulkan/icd.d/` + `/usr/share/vulkan/icd.d/`. Unset when no real ICD is present (CPU-only fallback).	An earlier image left both env vars unset and relied on alphabetical search order; on multi-vendor hosts where mesa's `lvp_icd.json` sorted before NVIDIA's `nvidia_icd.json` (or Intel/AMD mesa ICDs), `vmaf --vulkan_device 0` silently landed on lavapipe. ADR-0542 closes the race by filtering lavapipe out whenever a real ICD exists.
`VK_ICD_FILENAMES`	Unset (deprecated by Khronos in favour of `VK_DRIVER_FILES`).	Setting it overrides the loader's allowlist semantics; the prior `lvp_icd.x86_64.json` typo (ADR-0509 / Research-0138) hid every real GPU.
`LD_LIBRARY_PATH`	Includes `${ONEAPI_ROOT}/{compiler,umf,tcm}/latest/lib`.	`tcm/latest/lib` carries `libhwloc.so.15` — the level-zero UR adapter dlopens it at load time. Dropping it causes SYCL "Platforms: 0" on Intel Arc.
`NVIDIA_DRIVER_CAPABILITIES`	`compute,graphics,utility,video` (set in `dev/docker-compose.yml` common-env).	`graphics` is what makes the NVIDIA Container Toolkit bind-mount `nvidia_icd.json` into `/etc/vulkan/icd.d/`. Dropping `graphics` hides NVIDIA from Vulkan while leaving CUDA + nvidia-smi working — a hard regression to spot.
`HSA_OVERRIDE_GFX_VERSION`	Pinned to `10.3.0` in `common-env`.	AMD `gfx1036` (Raphael iGPU, RDNA2 IP rev 10.3.6) is not on the ROCm 6.x supported-GPU allowlist. Without the override, `hsa_init()` returns `HSA_STATUS_ERROR_OUT_OF_RESOURCES` and `rocminfo` reports "Unable to open /dev/kfd read-write: Invalid argument" even though `/dev/kfd` is bind-mounted. `gfx1036` is binary-compatible enough with `gfx1030` for the libvmaf HIP feature kernels (ADR-0530 / ADR-0538). ADR-0542.
`HSA_ENABLE_SDMA`	Pinned to `0` in `common-env`.	On RDNA2 iGPUs sharing system RAM with the CPU, the SDMA copy engine triggers VM faults on small device→host transfers (libvmaf collect path is dominated by such transfers). ADR-0543.
`ROCR_VISIBLE_DEVICES`	Pinned to `0` in `common-env`.	Pins HIP to the single AMD adapter on multi-iGPU + dGPU hosts so kernels cannot accidentally dispatch onto a non-RDNA2 device that needs a different `HSA_OVERRIDE_GFX_VERSION`. ADR-0542.

Operators that need to force a single Vulkan ICD per invocation can still docker exec vmaf-dev-mcp env VK_DRIVER_FILES=/path/to/icd.json vmaf … — the per-exec env var overrides the entrypoint-time pin. Operators on hosts with a ROCm-supported GPU on the allowlist (gfx1030 / gfx1100 / gfx1101 desktop / workstation parts) can override HSA_OVERRIDE_GFX_VERSION to the empty string at docker compose up time to remove the lie.

FFmpeg encoder matrix (post-ADR-0543)¶

The in-image FFmpeg is built with the fork's full encoder set so that vmaf-tune compare sweeps can address every codec the project supports without skipping rows with hardware encoder not available: ... not compiled into ffmpeg. The matrix:

Encoder	Compile-in source	Host runtime requirement
`libx264`	`libx264-dev` (apt)	none
`libx265`	`libx265-dev` (apt)	none
`libvpx-vp9`	`libvpx-dev` (apt)	none
`libsvtav1`	source build (SVT-AV1 pinned in `dev/Containerfile`)	none
`libaom-av1`	adapter exists, but the in-image FFmpeg intentionally omits libaom until patch 0007's ROI bridge targets released libaom fields	external FFmpeg with `--enable-libaom`, or wait for the patch-stack follow-up
`libvvenc`	source build (Fraunhofer VVenC v1.14.0)	none
`h264_nvenc` / `hevc_nvenc` / `av1_nvenc`	`--enable-nvenc` + `nv-codec-headers`	NVIDIA GPU + Container Toolkit; NVENC capability bit on host driver (av1_nvenc requires Ada or newer — RTX 4090 ok)
`h264_qsv` / `hevc_qsv` / `av1_qsv`	`--enable-libvpl` + `libvpl-dev` dispatcher + pinned `intel/vpl-gpu-rt` (`libmfx-gen.so`) source build installed under `/usr/lib/x86_64-linux-gnu/`	Intel GPU + `/dev/dri/renderD*` passthrough; the container auto-selects the Intel render node for QSV
`h264_amf` / `hevc_amf` / `av1_amf`	`--enable-amf` + AMF headers (source)	AMD GPU + `libamfrt64.so` from the proprietary `amdgpu-pro` userspace bind-mounted into the container. The open-source ROCm install in the image (`rocm-hip-runtime-dev`) does not include AMF.

To verify the in-image listing after a rebuild:

docker exec vmaf-dev-mcp ffmpeg -hide_banner -encoders 2>&1 \
    | grep -E "libsvtav1|libvvenc|libvpx-vp9|nvenc|qsv|amf|vpl" \
    | head -20

Expected (assuming the build-time encoder probe in stage 3.5 logged no WARN ... missing):

 V....D libsvtav1            SVT-AV1(Scalable Video Technology for AV1) encoder
 V..... libvvenc             libvvenc-based VVC encoder
 V....D libvpx-vp9           libvpx VP9
 V....D h264_nvenc           NVIDIA NVENC H.264 encoder
 V....D hevc_nvenc           NVIDIA NVENC hevc encoder
 V....D av1_nvenc            NVIDIA NVENC av1 encoder
 V....D h264_qsv             H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (Intel Quick Sync Video acceleration)
 V....D hevc_qsv             HEVC (Intel Quick Sync Video acceleration)
 V....D av1_qsv              AV1 (Intel Quick Sync Video acceleration)
 V....D h264_amf             AMD AMF H.264 Encoder
 V....D hevc_amf             AMD AMF HEVC encoder
 V....D av1_amf              AMD AMF AV1 encoder

Hardware-encoder runtime failure modes¶

The encoders above are split into "compile-in" (does the binary advertise the encoder?) and "runtime-ok" (does a 1-frame dummy encode succeed?). vmaf-tune compare's compare.py::probe_encoder_available runs both stages. The container locks down the compile-in promise; runtime failures produce stable row-level skip strings:

Symptom	Cause	Action
`hardware encoder not available: h264_nvenc dummy encode failed (...): Cannot load libcuda.so.1`	Container started without `runtime: nvidia`	`CONTAINER_RUNTIME=nvidia ./dev/scripts/dev-mcp-up.sh`
`hardware encoder not available: av1_nvenc dummy encode failed: Cannot load library`	Host NVIDIA driver too old for AV1 NVENC (Turing/Ampere don't have av1_nvenc)	Use h264_nvenc / hevc_nvenc on that host; av1_nvenc needs Ada or newer
`hardware encoder not available: h264_qsv dummy encode failed: Error creating a MFX session`	Stale image missing `libmfx-gen.so` in the dispatcher search path, or Intel iGPU not exposed (`/dev/dri/renderD*` missing)	Rebuild `dev-mcp` so the pinned `intel/vpl-gpu-rt` layer is present under `/usr/lib/x86_64-linux-gnu/`; verify `vainfo --display drm --device /dev/dri/renderD<N>` on the Intel node
`hardware encoder not available: h264_amf dummy encode failed: ... cannot open shared object libamfrt64.so`	amdgpu-pro userspace not bind-mounted	Install `amdgpu-pro` on the host and bind-mount `/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamfrt64.so` into the container, or accept that AMF encode is unavailable on this host

Reproducer — full cross-codec compare sweep¶

docker exec vmaf-dev-mcp bash -c '
  cd /workspace && PYTHONPATH=/workspace/tools/vmaf-tune/src:$PYTHONPATH \
  python -c "from vmaftune.cli import main; raise SystemExit(main())" compare \
    --src /workspace/.corpus/bbb_e2e/bbb_sunflower_1080p_60fps_normal.mp4 \
    --width 1920 --height 1080 --framerate 60 \
    --target-vmafs 85,90,92,95 \
    --encoders libx264,libx265,libsvtav1,libaom-av1,libvvenc,libvpx-vp9,h264_nvenc,hevc_nvenc,av1_nvenc,h264_qsv,hevc_qsv,av1_qsv,h264_amf,hevc_amf,av1_amf \
    --duration 5 --sample-clip-seconds 3 --max-iterations 3 \
    --score-backend cuda --format json --output /tmp/v11_1080p_cmp_full.json'

Encoders that are not runtime-available on the host produce per-row ok=false entries with the diagnostic strings above; the sweep does not abort.

Host-kernel ↔ container-userspace UAPI version pins (ADR-0543)¶

Intel NEO compute-runtime and ROCm KFD userspace are version-pinned via Containerfile ARGs to match the host kernel's i915 / xe / KFD ioctl ABI. A mismatch silently degrades vmaf --backend sycl|hip to CPU.

Pin	Current value	Why pinned
`ARG NEO_VER`	`26.18.38308.1`	Intel's `noble/unified` APT repo's newest as of 2026-05-18 is `25.18.x`, too old for kernel ≥ 7.0. NEO 25.18 returns `ZE_RESULT_ERROR_UNINITIALIZED` from `zeInit()` against kernel-7.x i915/xe. Pulled from `github.com/intel/compute-runtime/releases`.
`ARG IGC_VER` + `ARG GMMLIB_VER`	`2.34.4+21428` + `22.10.0`	NEO 26.18's release notes mandate IGC v2.34.4 + gmmlib 22.10.0. Pinned together.
`ARG ROCM_VER`	`7.2.3`	Matches Arch host `hsa-rocr 7.2.3`. ROCm 6.x KFD userspace returns `Unable to open /dev/kfd read-write: Invalid argument` against kernel-7.x KFD ioctls.

dev-mcp-entrypoint.sh emits a runtime visibility probe on container start (ADR-0543): WARN: SYCL level_zero:gpu NOT detected or WARN: HIP HSA agent NOT detected means the host kernel has revved past the pinned userspace ABI — bump the ARG and rebuild rather than working around the fallback (CLAUDE.md §12 r15 sub-rule 4). The latest NEO release tag is at https://github.com/intel/compute-runtime/releases/latest; the latest ROCm noble channel is listed under https://repo.radeon.com/rocm/apt/.