Kubernetes Deployment (Helm)¶

VMAFX ships a Helm chart under deploy/helm/vmafx/ that supports three workload types and all three GPU device-plugin vendors (NVIDIA, AMD, Intel).

A values.schema.json (Draft 2020-12) sits next to values.yaml and is consulted automatically by helm install, helm upgrade, and helm lint --strict. The schema enforces enum constraints on the load-bearing fields (workload, gpu.vendor, storage.mode, service.type, image.pullPolicy, persistence.accessMode, operator.logLevel, statefulSet.podManagementPolicy, monitoring.serviceMonitor.scheme) and uses additionalProperties: false on every typed sub-object so sibling-key typos (replicaCounts, repostiory, maxSurg) fail fast at install time instead of silently rendering a broken manifest. See ADR-0870 for the rationale.

Prerequisites¶

Helm v3.12 or later
A Kubernetes cluster (1.26+) with at least one GPU node (or CPU-only for testing)
The relevant GPU device-plugin daemonset installed on GPU nodes — see GPU scheduling guide

Quick start¶

# Add chart dependencies (prometheus-pushgateway — optional)
helm dependency build deploy/helm/vmafx/

# Install with NVIDIA GPU (default)
helm upgrade --install vmafx deploy/helm/vmafx/ \
  --namespace vmafx --create-namespace

# Install CPU-only (no GPU required)
helm upgrade --install vmafx deploy/helm/vmafx/ \
  --namespace vmafx --create-namespace \
  --set gpu.enabled=false \
  --set gpu.vendor=cpu

# Install with AMD GPU (HIP backend)
helm upgrade --install vmafx deploy/helm/vmafx/ \
  --namespace vmafx --create-namespace \
  --set gpu.vendor=amd

# Install with Intel GPU (SYCL backend)
helm upgrade --install vmafx deploy/helm/vmafx/ \
  --namespace vmafx --create-namespace \
  --set gpu.vendor=intel

GPU vendor matrix¶

`gpu.vendor`	Kubernetes resource	VMAFX backend	Required device-plugin
`nvidia`	`nvidia.com/gpu`	`cuda`	NVIDIA device plugin
`amd`	`amd.com/gpu`	`hip`	AMD ROCm device plugin
`intel`	`gpu.intel.com/i915`	`sycl`	Intel GPU plugin
`cpu`	(none)	`cpu`	(none)

The chart automatically sets the VMAFX_BACKEND environment variable inside the container based on gpu.vendor, so the VMAFX runtime picks the correct backend without further configuration.

Vulkan note: Vulkan is not a separate Kubernetes resource. It runs through whichever GPU device-plugin is allocated. See GPU scheduling guide.

Workload types¶

Select a workload type with --set workload=<type>.

Deployment (default) — long-running HTTP scoring server¶

helm upgrade --install vmafx deploy/helm/vmafx/ \
  --set workload=Deployment \
  --set deployment.replicaCount=3

The server exposes:

GET /healthz — liveness probe
GET /readyz — readiness probe
GET /metrics — Prometheus metrics (optional; enable monitoring.enabled=true)

Job — one-shot batch scoring¶

Suitable for CI pipelines, nightly ladder runs, and vmaf-tune compare jobs.

# batch-values.yaml
workload: Job
gpu:
  vendor: nvidia
  count: 1
job:
  command: ["vmaf-tune"]
  args: ["compare", "--config", "/corpus/batch.yaml"]
  ttlSecondsAfterFinished: 3600

helm upgrade --install vmafx-batch deploy/helm/vmafx/ \
  --namespace vmafx --create-namespace \
  --values batch-values.yaml
kubectl wait -n vmafx job/vmafx-batch --for=condition=complete --timeout=30m

StatefulSet — MCP server with sticky session state¶

Used when the MCP server requires stable identity and persistent state (e.g., session caches, socket file).

helm upgrade --install vmafx-mcp deploy/helm/vmafx/ \
  --set workload=StatefulSet

Each pod gets a dedicated 1Gi PVC at /var/lib/vmafx.

Environment variable reference¶

Variable	Set by	Description
`VMAFX_BACKEND`	Chart (from `gpu.vendor`)	Backend selector: `cuda`, `hip`, `sycl`, `cpu`
`VMAFX_MODEL_DIR`	ConfigMap (`config.VMAFX_MODEL_DIR`)	Path to VMAF model JSON files
`VMAFX_OUTPUT_DIR`	ConfigMap (`config.VMAFX_OUTPUT_DIR`)	Path for scored output
Any `VMAFX_*`	`values.yaml` `env:` block	Override arbitrary env vars

To add extra variables:

# values.yaml override
env:
  VMAFX_LOG_LEVEL: debug
  VMAFX_THREADS: "8"

Persistence¶

All PVCs are opt-in:

persistence:
  enabled: true
  storageClass: standard    # leave empty for default StorageClass
  corpus:
    enabled: true
    size: 100Gi
    mountPath: /corpus
  output:
    enabled: true
    size: 20Gi
    mountPath: /output
  models:
    enabled: true
    size: 2Gi
    mountPath: /models

Scaling¶

# Horizontal scale (Deployment only)
kubectl scale -n vmafx deployment/vmafx --replicas=4

# Rolling update to a new image
kubectl set image -n vmafx deployment/vmafx \
  vmafx=ghcr.io/vmafx/vmafx:3.1.0

The controller Deployment and the vmafx-node worker Deployment both use RollingUpdate with maxUnavailable: 0 and maxSurge: 1 by default, ensuring zero-downtime updates and preventing GPU pod eviction before replacements are ready (ADR-1094). The grace period defaults to 60 s (terminationGracePeriodSeconds: 60), giving in-flight scoring jobs time to finish before SIGKILL. Raise this to 300 s or more for long CHUG extractions:

terminationGracePeriodSeconds: 300

Monitoring¶

Enable Prometheus scraping via ServiceMonitor (requires prometheus-operator):

monitoring:
  enabled: true
  serviceMonitor:
    labels:
      release: prometheus    # match your Prometheus operator selector
    interval: 30s

For Job workloads that cannot expose a scrape endpoint, use the Prometheus Pushgateway dependency:

pushgateway:
  enabled: true

Ingress¶

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: vmafx.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: vmafx-tls
      hosts:
        - vmafx.example.com

Common operations¶

Check pod GPU allocation¶

kubectl describe pod -n vmafx -l app.kubernetes.io/name=vmafx \
  | grep -A 5 "Limits:"

Port-forward for local testing¶

kubectl port-forward -n vmafx svc/vmafx 8080:8080
curl http://localhost:8080/healthz

Run the built-in Helm test¶

helm test vmafx -n vmafx

Uninstall¶

helm uninstall vmafx -n vmafx
# PVCs are NOT deleted automatically — remove explicitly if desired:
kubectl delete pvc -n vmafx -l app.kubernetes.io/instance=vmafx

Pod security¶

Every pod the chart emits — controller Deployment, batch Job, sticky StatefulSet, vmafx-node worker Deployment, and the vmafx-operator Deployment — satisfies the Kubernetes Pod Security Admission "restricted" profile (ADR-0930):

Setting	Value	Why
`runAsNonRoot`	`true`	Required by `restricted`; matches the `USER nonroot:nonroot` directive in every production image (ADR-0878).
`runAsUser` / `runAsGroup`	`65532`	Distroless `gcr.io/distroless/cc-debian12` baked-in nonroot UID/GID — keeps file ownership consistent across `emptyDir`, PVCs, and rclone caches.
`readOnlyRootFilesystem`	`true`	Writes are restricted to explicitly-mounted `emptyDir` / PVC volumes (`/tmp`, the StatefulSet's `/var/lib/vmafx`). Catches privilege-escalation primitives that depend on overwriting on-disk binaries.
`allowPrivilegeEscalation`	`false`	Drops the `no_new_privs` exec bit; covers the SUID and `cap_setuid` escape paths.
`capabilities.drop`	`[ALL]`	Distroless containers do not need `CAP_NET_BIND_SERVICE` etc.; everything is dropped.
`seccompProfile.type`	`RuntimeDefault`	Engages the container-runtime default syscall filter (Docker/containerd ship a reasonable allow-list). Required by `restricted` since k8s 1.25.

To enforce the profile cluster-side, label your install namespace (k8s docs):

kubectl label --overwrite namespace vmafx-prod \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

If your image requires write access outside the mounted volumes, override podSecurityContext / securityContext in values.yaml — but doing so moves the namespace out of the restricted profile.

NetworkPolicy¶

Disabled by default (networkPolicy.enabled=false) because many clusters either ship their own CNI-managed policies (Cilium ClusterwideNetworkPolicy, Calico GlobalNetworkPolicy) or do not install a NetworkPolicy controller — in the latter case the chart's NetworkPolicies render but are inert.

Opt in with --set networkPolicy.enabled=true. The chart then emits a default-deny baseline plus four narrow allow-rules:

Policy	Direction	Peer	Ports	Purpose
`default-deny`	both	(no allow)	(all)	Safety net — drops everything that is not explicitly allowed. Emitted per workload component (root / operator / node) so a new component without an allow-rule remains isolated.
`allow-http-ingress`	ingress	every pod in the release namespace	`service.targetPort`	Scoring server reachable from any in-namespace client.
`allow-controller-to-node`	ingress	controller pods (selector match)	`50051` (configurable)	gRPC dispatch from controller to `vmafx-node` workers.
`allow-node-egress-object-store`	egress	configurable CIDR list (default `0.0.0.0/0` minus RFC1918)	`443`	rclone egress from worker pods to S3 / GCS / Azure Blob. Tighten `networkPolicy.allow.nodeEgressObjectStore.cidrs` to your bucket VPC CIDR in production.
`allow-operator-to-apiserver`	egress	`0.0.0.0/0` (apiserver Service IP is not selectable by a NetworkPolicy peer)	`443`, `6443`	controller-runtime list/watch traffic for the `vmafx-operator`.
`allow-node-metrics-ingress`	ingress	any in-namespace pod (or a narrower `fromPodSelector`)	`9090`	Prometheus scraping of the vmafx-node metrics endpoint. Tighten `networkPolicy.allow.nodeMetrics.fromPodSelector` to `{app.kubernetes.io/name: prometheus}` in production.
`allow-dns-egress`	egress	`kube-system` / CoreDNS pods	`53/udp`, `53/tcp`	Cluster DNS resolution — required for the other allow-rules to function.

Override knobs live under networkPolicy.allow.* in values.yaml; each rule has its own enabled switch so you can disable specific flows when your topology already covers them.

A NetworkPolicy-aware CNI (Cilium, Calico, kube-router, Antrea, ...) is required for the policies to take effect. Verify with:

kubectl get networkpolicy -n vmafx-prod -l app.kubernetes.io/instance=vmafx

PodDisruptionBudget¶

Disabled by default (podDisruptionBudget.enabled=false). Enable for HA deployments to prevent Kubernetes from evicting all pods simultaneously during node drains, cluster upgrades, or voluntary disruptions.

podDisruptionBudget:
  enabled: true
  # maxUnavailable: 1  — default: allows one voluntary disruption at a time.
  # Use this for all replica counts, including single-replica dev deployments.
  maxUnavailable: 1

The default strategy is maxUnavailable: 1. Do not use minAvailable: 1 with a single-replica Deployment — Kubernetes cannot satisfy minAvailable: 1 while draining the only pod, permanently blocking node drain operations. Switch to minAvailable only when replicaCount >= 2 and you need a hard lower-bound on serving capacity:

podDisruptionBudget:
  enabled: true
  minAvailable: 2   # requires replicaCount >= 3

When enabled, the chart creates a policy/v1 PodDisruptionBudget for each active pool (controller, node, operator).

Requires Kubernetes >= 1.21 (for policy/v1). See ADR-1058, ADR-1094.

GPU scheduling guide
Production Dockerfile — ADR-0698
Cloud-native redesign — ADR-0697
Helm chart ADR — ADR-0699
Security hardening ADR — ADR-1058
Rolling-update correctness ADR — ADR-1094