GPU Scheduling in Kubernetes¶
This guide explains how VMAFX maps GPU vendor device-plugins to Kubernetes resource limits, how Vulkan fits into the picture, and how to diagnose pending pods caused by insufficient GPU resources.
How GPU device-plugins work¶
A Kubernetes device-plugin is a daemonset that advertises custom extended resources (e.g. nvidia.com/gpu) to the kubelet. When a pod requests such a resource, the scheduler places it on a node that has enough of that resource available, and the kubelet allocates the physical device to the container.
VMAFX uses one device-plugin per GPU vendor:
| Vendor | Resource key | Backend | Plugin daemonset |
|---|---|---|---|
| NVIDIA | nvidia.com/gpu | CUDA | k8s-device-plugin |
| AMD | amd.com/gpu | HIP | k8s-device-plugin |
| Intel | gpu.intel.com/i915 | SYCL | intel-device-plugins-for-kubernetes |
Vulkan and Kubernetes¶
Vulkan is NOT a separate Kubernetes resource. There is no vulkan.khronos.org/gpu or equivalent extended resource in any vendor's device-plugin. Vulkan runs through whichever GPU device-plugin is allocated:
- NVIDIA node with
nvidia.com/gpu: 1→ Vulkan addresses the NVIDIA GPU via the NVIDIA Vulkan ICD. - AMD node with
amd.com/gpu: 1→ Vulkan addresses the AMD GPU via the AMDVLK / Mesa RADV ICD. - Intel node with
gpu.intel.com/i915: 1→ Vulkan addresses the Intel GPU via the Intel ANV / Mesa ANV ICD.
The VMAFX container image ships all three Vulkan ICDs. The runtime selects the correct ICD based on which device is present in /dev/dri/ after the device-plugin allocation.
Consequence for the Helm chart: set gpu.vendor to the physical GPU vendor. The chart requests the vendor's device-plugin resource and sets VMAFX_BACKEND accordingly. Vulkan acceleration is available automatically on any allocated GPU node without a separate resource request.
Installing device-plugins¶
NVIDIA¶
kubectl apply -f \
https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml
Verify:
kubectl get daemonset -n kube-system nvidia-device-plugin-daemonset
kubectl describe node <gpu-node> | grep -A 5 "nvidia.com/gpu"
AMD (ROCm)¶
kubectl apply -f \
https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml
Verify:
Intel¶
# Requires the Intel Device Plugins Operator or manual daemonset deploy.
# See: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin
kubectl apply -k \
https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/nfd_labeled_nodes
Verify:
Node capacity and allocatable¶
Check what GPU resources a node is advertising:
Example output for an NVIDIA node:
If the capacity shows 0 or the key is absent, the device-plugin is either not installed or the node does not have a compatible GPU.
Troubleshooting pending pods¶
Insufficient nvidia.com/gpu¶
Causes and fixes:
- Device-plugin not installed. Install the NVIDIA device-plugin daemonset.
- Node is tainted but pod has no toleration. Add a toleration:
- All GPUs already allocated. Reduce
gpu.count, free other pods, or add a GPU node. - Pod is requesting more GPUs than available.
Insufficient gpu.intel.com/i915¶
Same root causes as above, but for Intel. The Intel plugin additionally requires the NFD (Node Feature Discovery) operator to label nodes correctly. If the node is not labeled, the daemonset may not deploy onto it:
Insufficient amd.com/gpu¶
Same pattern. Also check that the ROCm version installed on the node matches what the device-plugin expects.
GPU pod is running but VMAFX uses CPU¶
Check that VMAFX_BACKEND is set correctly:
If the value is cpu but gpu.vendor is set to a GPU vendor, verify the device was actually allocated:
Checking node GPU feature labels¶
Node affinity and tolerations¶
GPU nodes are commonly tainted to prevent non-GPU pods from landing on them. A typical NVIDIA taint: nvidia.com/gpu=present:NoSchedule.
To ensure VMAFX is scheduled on GPU nodes:
# values.yaml
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.present
operator: In
values: ["true"]
Multi-GPU nodes¶
To request more than one GPU per pod:
Note that VMAFX processes a single job per pod; multiple GPUs per pod are only useful if the VMAFX backend supports intra-node multi-GPU dispatch.
Related¶
- Kubernetes deployment guide
- Backend documentation
- ADR-0699 — Helm chart design