ADR-0719: vmafx-node rclone Integration — Remote-Asset Streaming Without Disk Materialisation¶
- Status: Accepted
- Date: 2026-05-28
- Deciders: Lusoris
- Tags:
architecture,go,node,rclone,storage,ffmpeg,phase4b,fork-local
Context¶
Phase 4b (ADR-0709) introduces vmafx-node as the scoring worker in the distributed VMAFX platform. Each node pulls jobs from the controller, resolves the reference and distorted video URIs, runs ffmpeg encoding, and scores via libvmaf.
A key requirement from Phase 4b is that the node must not materialise the full source video to disk or RAM before scoring. At CHUG / K150K / BVI-DVC scale, a single reference sequence can exceed 10 GB; writing it to ephemeral node storage before scoring would:
- Saturate cluster I/O bandwidth.
- Require large ephemeral disk allocations in the k8s PodSpec.
- Eliminate the per-job cost benefit of horizontal scaling.
rclone (https://rclone.org) provides a single binary that can access 70+ storage backends (S3, GCS, Azure Blob, SFTP, HTTP, local filesystem, etc.) and expose them to consumer processes either as an HTTP server (rclone serve http) or as a POSIX FUSE mount (rclone mount). The user identified rclone as the correct tool for this use case.
The architecture dispatch for Phase 4b.5 evaluated three rclone streaming modes:
- HTTP-serve:
rclone serve http remote: --addr :PORT+ ffmpeg-i http://localhost:PORT/path. Zero-copy, multi-input capable, no FUSE kernel dependency. Selected as primary mode. - FUSE-mount:
rclone mount remote: /mnt/<job-id>+ ffmpeg reads from local path. Requiresfuse3in the container; adds FUSE kernel round-trip overhead. Selected as fallback. - Pipe:
rclone cat | ffmpeg -i pipe:0. Single input only; two-input pipe juggling is fragile. Not selected.
Decision¶
rclone is bundled into the vmafx-node distroless image via a multi-stage Docker build (docker/Dockerfile.node). The primary storage mode is HTTP-serve; FUSE-mount is the fallback. The storage layer is exposed as the pkg/storage.Storage interface:
type Storage interface {
Prepare(ctx context.Context, sourceURI string) (readableURL string, cleanup func(), err error)
Mode() Mode // "http-serve" | "mount" | "auto"
}
Three implementations ship:
HTTPServeStorage— spawnsrclone serve httpper-job; returnshttp://127.0.0.1:PORT/<path>.FUSEMountStorage— spawnsrclone mountper-job to a tmpdir; returns local path.LocalStorage— passthrough forfile://or bare paths.
The cmd/vmafx-node/executor.go Executor.Execute() method calls store.Prepare(ref) and store.Prepare(dis) before the ffmpeg subprocess and defers cleanup() for both.
The rclone configuration is provided via a Kubernetes Secret mounted at /etc/vmafx/rclone.conf. The Helm chart (deploy/helm/vmafx/) adds:
templates/secret-rclone-config.yaml— creates the Secret fromvalues.storage.rclone.config.templates/node-deployment.yaml— the vmafx-node Deployment, mounting the Secret read-only.values.yamladditions:storage.mode,storage.rclone.config, andnode.*block.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
rclone serve http (HTTP-serve) — chosen primary | Zero-copy, multi-input concurrent, no FUSE, ffmpeg libavformat HTTP demuxer native | Per-job subprocess spawn; PORT selection | Best fit: multi-input (ref + dis), no kernel dependency |
rclone mount (FUSE) | POSIX-transparent random access, works with any libav input | FUSE kernel dependency, slower than HTTP-serve, SIGKILL-to-unmount lifecycle complexity | Ships as fallback mode; not primary |
rclone cat pipe | No HTTP server, no FUSE | Single input only; ffmpeg -i pipe:0 cannot handle both ref+dis simultaneously | Not viable for two-input scoring |
| k8s PVC + CSI driver | Native k8s; no rclone binary | Forces materialisation to PVC; cloud-provider-specific CSI; no zero-copy | Per ADR-0709: "Copy files to node ephemeral storage... not zero-copy" |
| S3-FUSE (goofys, s3fs) | Lightweight | S3-only; no unified multi-backend support | rclone covers 70+ backends with one binary |
Consequences¶
Positive:
- Zero-copy streaming: reference and distorted videos stream from S3 / GCS / Azure Blob / SFTP / HTTP without any intermediate disk write.
- Unified storage abstraction:
pkg/storage.Storageinterface decouples the executor from any specific remote backend. - 70+ rclone backends supported out of the box; no per-backend node code required.
- Credentials are managed as Kubernetes Secrets; no credentials baked into the image.
- HTTP-serve mode adds no FUSE kernel dependency to the node container.
Negative:
- rclone binary (~55 MB uncompressed) increases the node image size.
- Per-job
rclone serve httpsubprocess adds ~100 ms startup latency before ffmpeg can read the first byte. The readiness poller inwaitForHTTPcompensates. - FUSE-mount fallback requires
fuse3and/dev/fusedevice access in the k8s PodSpec (addsecurityContext.capabilities.add: [SYS_ADMIN]when using mount mode).
Neutral / follow-ups:
- Phase 4b.6 (eBPF) will investigate whether eBPF can reduce FUSE round-trip overhead for the mount-mode fallback (Research-0733 target: FUSE bypass).
- The HTTP-serve port is ephemeral (OS-assigned free port); no port reservation in the PodSpec is needed.
- When HTTP-serve and mount mode both run concurrently for a job (e.g., ref via HTTP, dis via local path), the
LocalStoragepassthrough short-circuits the rclone spawn for the local input.
References¶
- ADR-0709 — Phase 4b umbrella; item 4b.5.
- ADR-0711 — vmafx-controller Phase 4b.1.
- Research-0733 — eBPF FUSE-bypass research (target: FUSE round-trip overhead in mount mode).
req— user direction (paraphrased): "I hope it was rclone that can stream directly into the encoder without materialising files to disk or RAM." (2026-05-28)req— architecture dispatch (paraphrased): use rclone to stream files without copying to disk or RAM first; HTTP-serve mode recommended as primary mode. (2026-05-28)- https://rclone.org/commands/rclone_serve_http/
- https://rclone.org/commands/rclone_mount/