ADR-0783: Kubernetes end-to-end integration test harness — kind + kuttl¶
- Status: Proposed
- Date: 2026-06-03
- Deciders: Lusoris
- Tags:
ci,testing,k8s,github
Context¶
The Phase 4b platform (ADR-0709) ships a Kubernetes Operator, Node worker, and sidecar trainer that interact through CRDs and a gRPC control plane. Unit tests exercise individual components in isolation, but there is no test that validates the full controller → node → trainer loop on real Kubernetes. This gap means regressions in CRD reconciliation, job dispatch, or cross-component communication go undetected until a live deployment.
A lightweight end-to-end harness using kind (Kubernetes in Docker) eliminates the need for a permanent cloud cluster while still exercising real Kubernetes APIs. kuttl (KUbernetes Test TooL) provides a declarative YAML-based assertion layer that is easier to maintain than raw Go integration tests.
The five test cases cover: CRD installation, VmafxJob pod lifecycle, VmafxNode heartbeat, rclone-sourced scoring, and the sidecar trainer checkpoint flow.
A nightly CI workflow (.github/workflows/e2e-k8s.yml) runs the harness at 03:47 UTC and is also opt-in on PRs via a run-e2e-k8s label. An 8-frame 64×64 YUV420p fixture pair in test/e2e/fixtures/ allows deterministic scoring without network access.
Decision¶
We will add a test/e2e/ directory containing:
kind-cluster.sh— idempotent cluster bootstrap with real-GPU (NVIDIA device plugin) or simulated-GPU (fake-device-plugin DaemonSet) paths, cert-manager, and CRD installation via the existingdeploy/helm/vmafxchart.kuttl-tests/— five ordered kuttl test cases:01-operator-installs: CRD establishment + operatorDeploymentavailable.02-vmafxjob-creates-pod: VmafxJob CR triggers a worker Pod and reachesSucceededphase with a populated score.03-node-heartbeat: VmafxNode CR backed by a stub/healthzserver receives alastHeartbeattimestamp from the operator's probe loop.04-rclone-score: End-to-end rclone-fetch → vmaf-score path via an in-cluster MinIO stand-in.05-sidecar-trainer: VmafxModelTraining CR reachesRunning, ingests one feedback sample, and pushes a checkpoint to an in-cluster OCI registry stub..github/workflows/e2e-k8s.yml— nightly schedule (03:47 UTC) plus opt-in on PRs via therun-e2e-k8slabel; skipped by default on PRs to keep CI fast.test/e2e/fixtures/gen-tiny-yuv.sh— generates committed 64×64 8-frame YUV420p clips for deterministic scoring without network fetches.docs/k8s/integration-tests.md— operator guide for running the suite locally and interpreting results.
kuttl is chosen over raw shell scripts because it provides declarative YAML assert semantics, built-in retry/timeout, per-step artifact collection, and XML test reporting consumable by the existing publish-unit-test-result-action.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| kuttl (chosen) | Declarative YAML assertions; maintained by kube-burner community; no custom Go code | Requires kind + kubectl already present; sequential-only test ordering | Best balance of simplicity and real-k8s coverage |
| Chainsaw (kuttl successor) | More expressive; better error messages; richer assertion DSL | Less mature; different YAML schema; ecosystem risk | Could migrate later once Chainsaw stabilises |
| chainsaw (Kyverno's e2e tool) | Rich assertion DSL; supports parallel steps | Newer, smaller ecosystem; adds Kyverno dependency for a non-Kyverno project | Ecosystem risk; overkill for five sequential test cases |
| envtest (controller-runtime) | Pure Go; runs in-process; fast | Does not exercise Kubernetes networking, DNS, or admission controllers | In-process simulation misses the integration surface we need to test |
| Permanent cloud cluster (EKS / GKE) | Closest to production; tests real GPU scheduling | Cost; secret management; slow teardown; cluster drift | Cost prohibitive for nightly runs; kind achieves the same CRD/reconciliation coverage |
Consequences¶
- Positive: Full controller → node → trainer loop is now automatically tested; regressions in CRD reconciliation are caught before merge; local developers can reproduce exactly with
bash test/e2e/kind-cluster.sh. - Positive: Nightly run on GitHub-hosted runners with no GPU hardware; the fake-device-plugin makes
nvidia.com/gpuresource requests schedulable in CI. - Positive: PR opt-in via label keeps normal PR CI latency unchanged.
- Negative: Nightly job adds ~15–30 min to CI wall time; fake-GPU path does not exercise CUDA kernels (GPU scoring in test case 04 uses CPU fallback).
- Negative: kind cluster bootstrap adds a new failure mode (cert-manager pull, device plugin DaemonSet scheduling) unrelated to VMAFx code itself; flakiness must be tracked and suppressed with retries if needed.
- Neutral / follow-ups: Test case 05 (sidecar-trainer) requires the operator
currentSamplesincrement logic to be implemented; therequired-aggregator.ymlshould markE2E — Kubernetes Integrationas non-blocking until all five cases pass.
References¶
- Open DRAFT PR: #152 (
feat(ci): k8s e2e integration test harness — kind + kuttl). - ADR-0709: Phase 4b distributed platform.
- ADR-0711: vmafx-controller implementation.
- ADR-0713: vmafx-node implementation.
- ADR-0714: vmafx-operator kubebuilder skeleton + CRDs.
- ADR-0781: sidecar SGD-EMA online trainer.
- ADR-0698: VMAFX production Dockerfile (Dockerfile.operator/Dockerfile.node TBD).
- kuttl documentation: https://kuttl.dev/docs/
- kind documentation: https://kind.sigs.k8s.io/
- fake-device-plugin: https://github.com/squat/k8s-fakedeviceplugin
- req: "Build a k8s integration test harness for the VMAFx Phase 4b platform."