ADR-0698: VMAFX Production Dockerfile — Multi-Arch, Image Signing, SBOM¶
- Status: Proposed
- Date: 2026-05-28
- Deciders: lusoris
- Tags:
docker,ci,release,security,sbom,signing,vmafx,fork-local
Context¶
The VMAFX rebrand (ADR-0686) requires a production-grade container image separate from the development MCP container (dev/Containerfile). The dev container bakes in every GPU SDK, the full oneAPI toolchain, and the NVIDIA Container Toolkit runtime — making it unsuitable for production deployments where image size, attack surface, and reproducibility matter.
A production image must satisfy four requirements:
- Slim runtime. No build tools, no GPU SDKs, no Python interpreter above what the MCP server needs. Target: under 150 MB for the default CPU CLI variant.
- Multi-arch. VMAFX runs on amd64 (x86_64 servers, cloud VMs) and arm64 (Apple Silicon dev machines, AWS Graviton, Ampere A1). Both must ship from the same CI workflow with identical binaries.
- Signed + auditable. Every image release must carry a Sigstore keyless cosign signature and a CycloneDX SBOM attached as a cosign attestation, so consumers can verify provenance without trusting the registry.
- GPU opt-in. CUDA / ROCm / SYCL / Vulkan runtimes are large and platform-specific. The default image ships CPU-only; GPU-augmented variants are separate tags so users never pay for GPU runtime overhead they do not need.
Decision¶
We will ship two Dockerfiles:
docker/Dockerfile.production— CPU-only, multi-stage, dual-target (--target cliand--target server). Builder stage usesubuntu:24.04; runtime stage usesgcr.io/distroless/cc-debian12(glibc, no shell). Python venv stage usespython:3.13-slimfor the MCP server target.docker/Dockerfile.production-gpu— same structure, parametrized into five build targets (final-cpu,final-cuda12,final-rocm6,final-oneapi2026,final-vulkan). GPU runtime libs are copied from their upstream base images rather than installing entire SDKs.
A new workflow .github/workflows/docker-publish-production.yml fires on v* tag pushes (release-please releases) and workflow_dispatch. It:
- Builds amd64 + arm64 for the CPU / Vulkan / server variants (QEMU emulation).
- Builds amd64-only for CUDA 12, ROCm 6, and oneAPI 2026 (GPU runtimes not portable to arm64 today; see Consequences).
- Signs every pushed digest via
cosign sign --yes(Sigstore keyless OIDC). - Generates a CycloneDX SBOM via
syftand attaches it as acosign attestpredicate. - Runs a smoke-test job that pulls the CPU CLI image and asserts
--versionexits 0.
Tag matrix:
| Tag suffix | Platforms | Description |
|---|---|---|
| (none) | amd64, arm64 | CPU-only CLI (default, smallest) |
-server | amd64, arm64 | CPU CLI + vmaf-mcp + vmaf-tune venv |
-cuda12 | amd64 | CUDA 12 runtime added |
-rocm6 | amd64 | ROCm 6 HIP runtime added |
-oneapi2026 | amd64 | Intel oneAPI 2026 SYCL runtime added |
-vulkan | amd64, arm64 | Vulkan ICD loader added |
Base image rationale: gcr.io/distroless/cc-debian12 was chosen over chainguard/wolfi-base and ubuntu:24.04 because it provides glibc (required by libvmaf.so), has no shell (reducing attack surface — no bash, no apt), and produces the smallest final layer footprint for a compiled C binary. Wolfi was considered but requires a separate package repo configuration; distroless is directly supported by cosign/syft tooling and maps cleanly to the Sigstore supply-chain story.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
ubuntu:24.04 runtime stage | Familiar; apt-get available for debugging | ~75 MB base; shell present (attack surface); users apt-install random deps | Distroless preferred per supply-chain policy |
chainguard/wolfi-base | ~20 MB; glibc; regularly patched | Requires Chainguard package repo for system libs; less tooling support in cosign/syft ecosystem today | Distroless is simpler and better integrated |
| Single Dockerfile + ARG GPU_RUNTIME (conditional COPY) | One file to maintain | BuildKit does not skip unused multi-stage chains from a single FROM — a cuda12 build still pulls the ubuntu CUDA toolkit layer even when not selected | Two Dockerfiles (production + production-gpu) are cleaner |
| Parametrize GPU SDK install via ARG in builder stage | Fewer files | Builder stage must install GPU SDK at build time; CI runners have no GPU SDK by default and the apt install for cuda-toolkit-12-0 alone exceeds 3 GB | COPY runtime libs from upstream base images is smaller + faster |
| Arm64 for CUDA/ROCm/SYCL | Full parity | CUDA for arm64 requires aarch64-linux-gnu toolkit; ROCm arm64 not officially distributed; oneAPI arm64 not supported | Documented limitation; revisit when upstream distributes arm64 GPU runtimes |
Consequences¶
- Positive: reproducible, signed, SBOM-attached production images on every release. Consumers can
cosign verify-attestationto confirm the SBOM was built by CI. Image size for the CPU CLI target is expected to be 80–150 MB (distroless base ~20 MB + libvmaf.so ~8 MB + vmaf binary ~4 MB + models ~60 MB). - Positive: arm64 support unblocks Apple Silicon and AWS Graviton deployments.
- Negative: GPU variant arm64 support is deferred. The
-cuda12,-rocm6, and-oneapi2026tags are amd64-only until their upstream distributions ship arm64 packages (CUDA / ROCm / oneAPI all have amd64-first release cadences). - Neutral:
docker/Dockerfile.productionis fork-only and does not conflict with upstream Netflix/vmaf, which ships no production Dockerfile. - Follow-up: add a pixel-level golden-score assertion to the CI smoke-test once the YUV test fixtures are mounted into the GitHub Actions runner (requires a separate artifact-cache or fixture-embed step). The current smoke-test only checks
--version. - Follow-up: add
T-DOCKER-SMOKEto the "Recently closed" block indocs/state.mdonce the smoke-step has been green for three consecutive master runs (per the existing deferred entry).
References¶
- ADR-0686: VMAFX rebrand umbrella —
docs/adr/0686-vmafx-rebrand-aggressive-modernization.md dev/Containerfile— development / MCP container (separate, not replaced)docker/Dockerfile.production— this ADR's primary deliverabledocker/Dockerfile.production-gpu— GPU-augmented variant.github/workflows/docker-publish-production.yml— CI publish workflowdocs/development/docker-production.md— operator guide- Sigstore cosign: https://docs.sigstore.dev/cosign/signing/overview/
- syft: https://github.com/anchore/syft
- distroless cc-debian12: https://github.com/GoogleContainerTools/distroless
- PR:
feat(docker): production multi-arch Dockerfile + image signing + SBOM - Parent PR: #1546 (VMAFX rebrand umbrella)
- Source: user direction (VMAFX Phase 3B brief, 2026-05-28)