Sidecar Online Training¶

The VMAFX sidecar trainer continuously fine-tunes a tiny-AI ONNX model while vmafx-node processes encoding jobs. Every scored job contributes a (features, true_score) training pair that is shipped to a co-located Python sidecar container over a Unix socket. No round-trip to a training cluster is required; checkpoints are available within minutes.

This surface is part of the VMAFX Phase 4b distributed platform (ADR-0781, ADR-0709). It is separate from the on-host bias-correction sidecar used by vmaf-tune (ADR-0394), which is a single-host, non-k8s, non-ONNX surface.

Quick start¶

Enable the sidecar in your Helm values:

sidecar:
  trainer:
    enabled: true
    image:
      repository: ghcr.io/vmafx/vmafx-sidecar
      tag: latest
    baseModelPath: /mnt/vmafx-models/base/fr_regressor_v3.onnx
    checkpointDir: /mnt/vmafx-models/online
    nFeatures: 80
    resources:
      requests:
        cpu: "0.5"
        memory: "512Mi"
      limits:
        cpu: "2"
        memory: "1Gi"

persistence:
  models:
    enabled: true
    mountPath: /mnt/vmafx-models

The sidecar container starts alongside vmafx-node in the same pod, binds /tmp/vmafx-sidecar.sock, and begins accepting training pairs immediately.

Architecture¶

vmafx-node (Go)               vmafx-sidecar (Python)
─────────────────             ──────────────────────────
scorer.Score(ref, dis)   ──→  ReplayBuffer (10 000 cap)
  │ FeedbackClient.Send()         │
  │ (non-blocking, 1000-entry     │ batch (32 samples, 50% replay)
  │  ring buffer)                 ↓
  │                          SGDEMATrainer.step()
  │                              │ EMA update (β=0.999)
  │                              │
  │                        every 10 min + 1000 new samples:
  │                          export_onnx() → /mnt/vmafx-models/online/
  │                          write SHA-256 sidecar
  │
  ↓ next restart: node loads new ONNX checkpoint

Why Unix socket? The vmafx-node and vmafx-sidecar containers share an emptyDir volume mounted at /tmp. Unix domain sockets over a shared volume have lower per-message overhead than TCP loopback for the expected message rate (< 100 pairs/s) and avoid an extra container port declaration.

Why SGD + EMA? Standard SGD is lower overhead per step than Adam for the tiny two-layer regression heads used here. The EMA shadow (Polyak averaging, beta=0.999) smooths out noisy gradient steps from short-content bursts (ADR-0781 §Decision; Mean Teacher paper, 2017).

Why a replay buffer? Without replay, a sudden wave of narrow-content jobs (e.g., one hour of HDR animation) overwrites the model's knowledge of other content types. The 10 000-sample ring buffer (≈ 3.2 MB at 80 float32 features/sample) retains ~200 hours of a 50-encode/hour workload.

Configuration reference¶

All settings are Helm values under sidecar.trainer.* and are passed to the sidecar container as environment variables.

Helm key	Env var	Default	Description
`enabled`	—	`false`	Enable the sidecar container.
`image.repository`	—	—	Sidecar container image.
`image.tag`	—	chart AppVersion	Image tag.
`baseModelPath`	`VMAFX_BASE_MODEL_PATH`	`/mnt/vmafx-models/base/model.onnx`	ONNX or PyTorch state-dict to fine-tune.
`checkpointDir`	`VMAFX_SIDECAR_CHECKPOINT_DIR`	`/mnt/vmafx-models/online`	Directory for versioned ONNX checkpoint output.
`replayBufferSize`	`VMAFX_SIDECAR_REPLAY_CAPACITY`	`10000`	Replay buffer capacity (samples).
`batchSize`	`VMAFX_SIDECAR_BATCH_SIZE`	`32`	Mini-batch size per gradient step.
`replayMixRatio`	`VMAFX_SIDECAR_REPLAY_MIX`	`0.5`	Fraction of each batch drawn from replay buffer.
`learningRate`	`VMAFX_SIDECAR_LR`	`0.0001`	SGD/Adam learning rate.
`emaDecay`	`VMAFX_SIDECAR_EMA_DECAY`	`0.999`	EMA decay beta.
`checkpointIntervalSeconds`	`VMAFX_SIDECAR_CKPT_INTERVAL_S`	`600`	Minimum seconds between checkpoints.
`minSamplesPerCheckpoint`	`VMAFX_SIDECAR_MIN_SAMPLES_CKPT`	`1000`	Minimum new samples before checkpoint.
`nFeatures`	`VMAFX_SIDECAR_N_FEATURES`	`80`	Feature vector dimension from vmafx-node.

The socket path (VMAFX_SIDECAR_SOCKET) defaults to /tmp/vmafx-sidecar.sock and should not be changed unless the volume mount path also changes.

Checkpoint format¶

Each checkpoint export writes two files atomically:

/mnt/vmafx-models/online/model_v000042.onnx         # EMA model
/mnt/vmafx-models/online/model_v000042.onnx.sha256  # SHA-256 digest

The ONNX file uses opset 17 (matching ADR-0249 and the rest of the tiny-AI export stack). Input shape: (batch, n_features). Output shape: (batch, 1).

On next vmafx-node restart, set VMAFX_BASE_MODEL_PATH (or the Helm baseModelPath value) to the new checkpoint path to pick it up. Automated live hot-reload is a follow-up (requires the VmafxModelTraining CRD, Research-0733 §3.4).

Kubernetes sidecar lifecycle (k8s 1.29+)¶

On clusters running Kubernetes 1.29 or later the sidecar container is declared with restartPolicy: Always (native sidecar KEP-753 semantics). This means:

The sidecar starts before the main vmafx-node container.
The pod's Ready condition waits for the sidecar's readiness probe to pass (Unix socket bound).
If the sidecar crashes it is restarted without restarting vmafx-node.

On older clusters (restartPolicy: Always is ignored) the sidecar behaves as a regular container; a crash restarts the whole pod.

Go-side metrics¶

FeedbackClient exposes two counters available via the node's Prometheus metrics endpoint (when instrumented):

Counter	Description
`vmafx_training_feedback_dropped_total`	Messages dropped due to queue overflow or sidecar unavailability.
`vmafx_training_feedback_delivered_total`	Messages successfully delivered to the sidecar.

A non-zero dropped counter means the sidecar is behind the scoring rate. Increase resources.limits.cpu for the sidecar or reduce batchSize.

Limitations (v1)¶

No live hot-reload: the node loads the new checkpoint only on restart. The VmafxModelTraining CRD + controller (atomic session swap via atomic.Pointer) is the follow-up.
CPU training only by default: CUDA training available via VMAFX_SIDECAR_CUDA=1 but requires the node pod to have excess GPU budget.
Single base model per sidecar: per-tenant adapter heads (LoRA) are deferred to v2.
No stability gate: a checkpoint that regresses PLCC by more than 0.005 is not automatically quarantined in v1. Planned per Research-0733 §3.4.