ADR-0996: eBPF FUSE bypass for rclone zero-copy path in vmafx-node¶
- Status: Proposed
- Date: 2026-06-03
- Deciders: Lusoris
- Tags:
ci,go,ebpf,rclone,performance,security,supply-chain
Context¶
The vmafx-node Go worker fetches media clips from object storage via an rclone FUSE mount. Each clip open under the FUSE mount incurs a FUSE kernel round-trip plus an rclone network fetch even for clips that have been pulled to a local cache since the previous run. At 1080p clip sizes this round-trip adds 370 ms p50 latency per clip open (Research-0733). With a 150 k-clip dataset this dominates wall time.
A probe-only eBPF tracepoint program can track file-descriptor opens under the rclone FUSE mount prefix without the overhead of a full FUSE intercept. By maintaining a BPF hash map of bypass-eligible FDs and draining events via a ring buffer, the in-process cache stays warm without polling, collapsing warm-cache clip-open latency to ~10 ms (37× improvement). Research-0733 measured this on the fork's RTX 4090 dev machine.
The feature is gated behind VMAFX_EBPF_BYPASS=1 (default off) and requires Linux 5.15+ and CAP_BPF. A compile-time stub (rclone_bypass_stub.go) allows CI builds without a BPF toolchain, preserving cross-platform compatibility.
Decision¶
We will ship a probe-only eBPF tracepoint program (rclone_bypass.bpf.c) plus a Go loader using cilium/ebpf v0.21.0 in cmd/vmafx-node/bpf/. The feature is opt-in via VMAFX_EBPF_BYPASS=1. BPF objects are generated artifacts regenerated with go generate ./cmd/vmafx-node/bpf/. A stub auto-selects on non-Linux hosts or when the BPF toolchain is absent.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Full FUSE intercept (custom FUSE driver) | Complete control over all I/O paths | Requires kernel module; far higher complexity; breaks rclone compatibility | Too invasive; deployment requires root |
| Polling the rclone cache directory | Simple to implement; no kernel deps | High polling overhead; latency floor ~100 ms even at aggressive poll intervals; wastes CPU | Latency too high; CPU cost unacceptable at 150 k-clip scale |
| inotify watch on rclone cache dir | No BPF required; works on older kernels | inotify is per-inode; races between open and add; does not track FD → file mapping reliably | Race condition; inotify events arrive after the open, not before |
| Disable rclone FUSE; use direct S3 SDK | Eliminates FUSE overhead entirely | Requires S3 credentials in every worker pod; breaks the mount-abstraction that lets us swap object stores | Credential sprawl; mount abstraction is a design invariant (ADR-0709) |
Consequences¶
- Positive: 37× warm-cache clip-open latency improvement (370 ms → 10 ms p50); unlocks full 150 k-clip throughput on a single node without pre-staging; opt-in design means existing deploys are unaffected.
- Negative: Requires Linux 5.15+,
CAP_BPF, andclang + bpf2gofor regeneration; addscilium/ebpf v0.21.0as a runtime dependency; widens the Linux-only surface. - Neutral / follow-ups: BPF objects must be regenerated when the tracepoint struct layout changes (
go generate ./cmd/vmafx-node/bpf/); this is anAGENTS.mdinvariant incmd/vmafx-node/bpf/.
Supply-chain impact¶
- New dependencies:
cilium/ebpf v0.21.0(runtime, Apache-2.0,https://github.com/cilium/ebpf). - Build-time fetches:
go generateinvokesbpf2go(installed via Go toolchain); pinned bygo.sum. - Sigstore-signable: Go module hash in
go.sumprovides integrity anchor. - CVE surface delta: adds a BPF loader; BPF programs are kernel-verified and run in read-only probe mode — no packet manipulation, no memory writes outside the map.
References¶
- Research-0733: 37× latency improvement measurement data.
- Open DRAFT PR: #137 (
feat(node): eBPF FUSE bypass for rclone). - ADR-0709: Phase 4b distributed platform (rclone mount-abstraction invariant).
- ADR-0719: vmafx-node rclone integration (the surface this optimizes).