ADR-0529: Replace /dev/dri/by-path bind with whole /dev/dri bind in dev container¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (Anthropic)
- Tags:
build,ci,cuda,sycl,agents
Context¶
PR #1275 (ADR-0514) added a read-only bind-mount of /dev/dri/by-path to both the dev-mcp and smoke-probe-cron services in dev/docker-compose.yml. The purpose was to expose the udev-managed pci-XXXX:YY:ZZ.W-{card,render} symlinks that the Intel level-zero runtime uses to discover Arc GPUs; Docker's devices: directive carries only leaf device nodes and silently drops subdirectory entries such as by-path/.
The bind target is a directory whose contents are generated by udevd from the host kernel's PCI enumeration. After any event that causes re-enumeration (reboot, suspend/resume cycle, GPU hotplug, or a kernel module reload) the symlink names inside by-path/ change to reflect the new bus addresses. Docker's compose config stores the bind source path at docker compose up time; if the directory content has changed or the directory is momentarily empty (race window during udev settle), the OCI hook fails to create the mount and the container exits before the entrypoint runs.
The immediate trigger was a BBB 4K v10 report regeneration run: the host had been rebooted between the last container start and the new run, and the by-path symlinks had changed.
Decision¶
We will replace the /dev/dri/by-path-only bind-mount with a bind-mount of the whole /dev/dri directory in both services. The whole-directory bind provides:
- The
card*andrenderD*device nodes (previously carried bydevices:). - The
by-path/andby-id/subdirectories with their udev symlinks.
Because the bind source is /dev/dri itself (a stable tmpfs directory managed by the kernel's devtmpfs), it survives PCI re-enumeration without storing any PCI-address-specific path in the compose config. The devices: /dev/dri:/dev/dri entry is therefore removed from both services — the bind-mount subsumes it. /dev/kfd (AMD ROCm) remains under devices: because it is a single leaf node with no subdirectory dependency.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Keep /dev/dri/by-path-only bind (status quo) | Minimal surface area; only symlinks exposed | Fails after any PCI re-enumeration; requires manual operator intervention to restart | Rejected — the core complaint of this ADR |
Whole /dev/dri bind (chosen) | Stable bind source; no PCI-address dependency; subsumes devices: entry; simpler compose | Exposes all DRI nodes to the container (minor privilege expansion) | Chosen — the privilege expansion is acceptable; the dev container already has GPU passthrough via NVIDIA runtime + /dev/kfd |
| Enumerate card/renderD at run-time via entrypoint script | No directory-level bind required; minimal privilege | Complex script; still needs by-path/ for SYCL level-zero; adds startup latency | Rejected — complexity without benefit; whole-dir bind achieves the same with less moving parts |
Bind /dev/dri and /dev/dri/by-path separately | Explicit surface | /dev/dri already contains by-path/; two mounts for the same subtree is redundant | Rejected — redundant |
Consequences¶
- Positive: Container starts reliably after reboot, suspend/resume, or GPU hotplug. No operator intervention required to fix symlink-rot. The compose file is simpler (one bind covers what was two separate entries).
- Negative: The container sees all DRI nodes on the host, not just the subset Docker's
devices:would enumerate. On a host with unrelated DRI devices (e.g. an on-board iGPU the operator does not intend to use), those devices are also visible inside the container. This is a minor privilege expansion; it is acceptable given that the container already receives full NVIDIA GPU passthrough and/dev/kfd. - Neutral / follow-ups: The
dev/AGENTS.mdinvariant note for ADR-0514 must be updated to reflect that the bind is now/dev/drirather than/dev/dri/by-path. Thedocs/development/dev-mcp.mdbackend matrix table must update the SYCL row's "Required host state" column.
References¶
- ADR-0514:
dev/docker-compose.ymlbind-mounts/dev/dri/by-path(superseded in scope by this decision for the bind-mount strategy). - Research-0138: diagnosis of Intel level-zero symlink dependency.
- req: "Replace the
/dev/dri/by-pathbind with whole/dev/dribind-mount — simpler, no symlink-rot vulnerability."