Skip to content

ADR-0529: Replace /dev/dri/by-path bind with whole /dev/dri bind in dev container

  • Status: Accepted
  • Date: 2026-05-18
  • Deciders: lusoris, Claude (Anthropic)
  • Tags: build, ci, cuda, sycl, agents

Context

PR #1275 (ADR-0514) added a read-only bind-mount of /dev/dri/by-path to both the dev-mcp and smoke-probe-cron services in dev/docker-compose.yml. The purpose was to expose the udev-managed pci-XXXX:YY:ZZ.W-{card,render} symlinks that the Intel level-zero runtime uses to discover Arc GPUs; Docker's devices: directive carries only leaf device nodes and silently drops subdirectory entries such as by-path/.

The bind target is a directory whose contents are generated by udevd from the host kernel's PCI enumeration. After any event that causes re-enumeration (reboot, suspend/resume cycle, GPU hotplug, or a kernel module reload) the symlink names inside by-path/ change to reflect the new bus addresses. Docker's compose config stores the bind source path at docker compose up time; if the directory content has changed or the directory is momentarily empty (race window during udev settle), the OCI hook fails to create the mount and the container exits before the entrypoint runs.

The immediate trigger was a BBB 4K v10 report regeneration run: the host had been rebooted between the last container start and the new run, and the by-path symlinks had changed.

Decision

We will replace the /dev/dri/by-path-only bind-mount with a bind-mount of the whole /dev/dri directory in both services. The whole-directory bind provides:

  • The card* and renderD* device nodes (previously carried by devices:).
  • The by-path/ and by-id/ subdirectories with their udev symlinks.

Because the bind source is /dev/dri itself (a stable tmpfs directory managed by the kernel's devtmpfs), it survives PCI re-enumeration without storing any PCI-address-specific path in the compose config. The devices: /dev/dri:/dev/dri entry is therefore removed from both services — the bind-mount subsumes it. /dev/kfd (AMD ROCm) remains under devices: because it is a single leaf node with no subdirectory dependency.

Alternatives considered

Option Pros Cons Why not chosen
Keep /dev/dri/by-path-only bind (status quo) Minimal surface area; only symlinks exposed Fails after any PCI re-enumeration; requires manual operator intervention to restart Rejected — the core complaint of this ADR
Whole /dev/dri bind (chosen) Stable bind source; no PCI-address dependency; subsumes devices: entry; simpler compose Exposes all DRI nodes to the container (minor privilege expansion) Chosen — the privilege expansion is acceptable; the dev container already has GPU passthrough via NVIDIA runtime + /dev/kfd
Enumerate card/renderD at run-time via entrypoint script No directory-level bind required; minimal privilege Complex script; still needs by-path/ for SYCL level-zero; adds startup latency Rejected — complexity without benefit; whole-dir bind achieves the same with less moving parts
Bind /dev/dri and /dev/dri/by-path separately Explicit surface /dev/dri already contains by-path/; two mounts for the same subtree is redundant Rejected — redundant

Consequences

  • Positive: Container starts reliably after reboot, suspend/resume, or GPU hotplug. No operator intervention required to fix symlink-rot. The compose file is simpler (one bind covers what was two separate entries).
  • Negative: The container sees all DRI nodes on the host, not just the subset Docker's devices: would enumerate. On a host with unrelated DRI devices (e.g. an on-board iGPU the operator does not intend to use), those devices are also visible inside the container. This is a minor privilege expansion; it is acceptable given that the container already receives full NVIDIA GPU passthrough and /dev/kfd.
  • Neutral / follow-ups: The dev/AGENTS.md invariant note for ADR-0514 must be updated to reflect that the bind is now /dev/dri rather than /dev/dri/by-path. The docs/development/dev-mcp.md backend matrix table must update the SYCL row's "Required host state" column.

References

  • ADR-0514: dev/docker-compose.yml bind-mounts /dev/dri/by-path (superseded in scope by this decision for the bind-mount strategy).
  • Research-0138: diagnosis of Intel level-zero symlink dependency.
  • req: "Replace the /dev/dri/by-path bind with whole /dev/dri bind-mount — simpler, no symlink-rot vulnerability."