ADR-0781: Sidecar online training — SGD + EMA + replay buffer¶
- Status: Proposed
- Date: 2026-05-29
- Deciders: Lusoris, Claude
- Tags: ai, sidecar, online-learning, k8s, vmafx-node, phase4b, fork-local
Context¶
VMAFX Phase 4b (ADR-0709) converts the platform into a distributed cloud-native scoring service. Every job that vmafx-node executes produces a (features, predicted_score) pair alongside the ground-truth VMAF score from libvmaf. That pair is a free training signal that is currently discarded after each job.
Research-0733 evaluated three architectures for closing the encode-score-train loop:
- Option A — Python sidecar container co-located in the same pod
- Option B — Dedicated training-node pool receiving batched triples from the controller
- Option C — Pure-Go SGD (Gorgonia or hand-rolled)
The research digest recommended Option A for v1. Option B over-engineers at current cluster scale (< 1000 triples/hour). Option C is blocked by the Go ML ecosystem's immaturity for multi-layer networks.
The user direction that motivates this work (per ADR-0709 references): continual model improvement during live encoding without waiting for the next offline batch-training round-trip.
The existing tools/vmaf-tune/sidecar.py (ADR-0394) implements on-host online ridge regression for the vmaf-tune predictor surface — a related but separate concern (single-host bias correction, no k8s, no ONNX export). This ADR covers the distributed k8s sidecar that fine-tunes the tiny-AI ONNX models consumed by libvmaf's DNN path.
Decision¶
We will ship a Python sidecar container (ai/sidecar/) that runs co-located with each vmafx-node pod, connected via a Unix domain socket at /tmp/vmafx-sidecar.sock (shared emptyDir volume).
The sidecar implements:
- Replay buffer (
replay_buffer.py) — bounded ring buffer, capacity 10 000 samples (FIFO eviction). Chosen because 10 000 × 80 float32 features ≈ 3.2 MB in-process; enough to represent ~200 hours of a 50-encode/hour workload without unbounded growth. - Online SGD + EMA (
sgd_ema.py) — SGD with momentum (default) or Adam; EMA shadow withbeta=0.999; gradient clipping atmax_norm=1.0; 100-step linear LR warmup. EMA decay0.999follows the Mean Teacher paper (Tarvainen & Valpola, 2017) and is the standard for online VMAF work (Research-0733 §3.3). - Checkpoint export — EMA model exported to ONNX (opset 17, matching ADR-0249) with an atomic rename + SHA-256 sidecar file. Dual-condition trigger:
>= 10 minAND>= 1000 new samplessince last checkpoint. - Socket server (
online_trainer.py) — newline-delimited JSON, one-thread-per-connection. Non-blocking from the Go side:FeedbackClientenqueues into a 1000-entry ring buffer; a background goroutine drains it.
The Go-side client (cmd/vmafx-node/online_feedback.go) calls FeedbackClient.Send() from the scoring path after scorer.Score() returns. The call is non-blocking; dropped messages are counted via FeedbackClient.Dropped().
The Helm sidecar container spec is rendered by deploy/helm/vmafx/templates/sidecar-trainer.yaml via the vmafx.sidecarContainer named template and is gated by .Values.sidecar.trainer.enabled (default false).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Python sidecar per pod (chosen) | Reuses ai/ PyTorch stack. Sub-100 ms data-to-training latency. Pod-level isolation from scoring. Resource limits declared separately. | Heavier pod image (~1.5 GB PyTorch CPU / ~5 GB CUDA). Sidecar lifecycle managed by Operator. | — |
| Dedicated training-node pool (Research-0733 Option B) | Clean separation of concerns. Independently scaled. Smaller scoring pod images. | Cross-pod data transport (1–10 s RTT). Requires a fault-tolerant training queue. Two new node types in Helm + Operator. Overkill at v1 scale. | Deferred to v2 when cluster scale demands dedicated GPU training pools. |
| Pure-Go SGD (Research-0733 Option C) | Single binary. No Python dep. Lowest per-sample overhead. | Go ML ecosystem immature for multi-layer nets. Gorgonia effectively unmaintained. Cannot load ONNX natively. Blocks future improvements (attention, LoRA). | Not viable for full neural network training at any scale. |
| gRPC loopback instead of Unix socket | Strong typing. Schema evolution. Native flow control. | Second RPC framework in the pod. Extra container port declaration. Measurably higher per-message overhead on loopback than UDS for < 1 KB payloads. | Unix socket + newline-delimited JSON is simpler, lower overhead, and sufficient for the message rate (< 100/s). The Research-0733 preference for gRPC is overridden by the user's implementation spec. |
| Shared-volume file watch | No socket needed. | Worst latency (seconds for inotify). Not suitable for near-real-time training. | Rejected. |
Consequences¶
- Positive:
- Encode-score-train loop closes to minutes instead of hours/days.
- The Go scoring path is never blocked:
FeedbackClient.Send()is non-blocking with graceful drop under back-pressure. - EMA stabilises checkpoints against noisy gradient updates from short-content bursts (e.g., an hour of HDR animation encodes).
- Replay buffer prevents catastrophic forgetting when content distribution shifts.
- Sidecar container failure does not affect the scoring container (separate process, different restart policy).
- Negative:
- Pod image grows by ~1.5 GB (CPU PyTorch) or ~5 GB (CUDA PyTorch).
- Resource contention if both containers attempt to use the same CUDA device simultaneously. Mitigation: sidecar trains on CPU (default); CUDA training enabled only via
VMAFX_SIDECAR_CUDA=1when the node pod has excess GPU budget. - New per-pod state (checkpoint directory mount) that must be managed by the operator.
- Neutral / follow-ups:
VmafxModelTrainingCRD + operator reconciler (Research-0733 §4) are not in scope for this PR. The Helm values gate (sidecar.trainer.enabled) and theFeedbackClientwiring invmafx-nodeare the v1 surface; the Operator is a separate ADR.- Stability gate (PLCC regression check before marking a checkpoint
latest-stable) is a follow-up per Research-0733 §3.4. - EWC regularisation deferred to v2 (requires Fisher information matrix from the base corpus — not available at sidecar startup).
- LoRA-style per-tenant adapters deferred to v2.
References¶
- Research-0733 — VMAFX sidecar online training architecture
- ADR-0709 — Phase 4b umbrella, item 4b.7
- ADR-0394 — local sidecar for vmaf-tune predictor (separate surface)
- ADR-0249 — ONNX export opset constraints
- ADR-0042 — tiny-AI per-PR docs rule
- Tarvainen & Valpola (2017) "Mean teachers are better role models" — EMA decay
beta=0.999 - Source:
req— user direction to implement sidecar online training per Research-0733 (Python sidecar, SGD + EMA + replay buffer, Phase 4b distributed platform piece).