GitOps on On-Premises Kubernetes: What Actually Works
Lessons from a GitOps rollout that broke the first time—registry mirrors, runner placement, and promotion paths that work in private clusters.
A platform team we worked with already ran Kubernetes in production and had Argo CD working in dev. That changed on the first push toward an isolated production zone: CI-built images could not be pulled where the cluster lived, and sync jobs failed in ways the dashboard did not make obvious.
Agree on zones before picking tools
Before writing Application manifests, we traced the artifact path: where CI pushes images, which mirror staging reads, and what must never leave the production zone. Three boxes on a whiteboard—build, staging, production—saved a week of rework. If a hop needs outbound internet, the problem is topology, not your GitOps controller.
Registry mirrors that stay in sync (even air-gapped)
Harbor replication failed quietly when upstream tags were overwritten. We switched to immutable tags per build, added a nightly digest-count check, and alert on Prometheus when replication lags past 15 minutes. “It worked yesterday” no longer meant prod had the same image staging tested.
Runner placement that passes security review
Shared GitLab runners across dev and staging VLANs broke the customer’s change policy. We put runners inside each zone and limited promotion tokens so a staging job cannot deploy to production. Per-zone build caches are slower—but auditable. Shared cross-zone runners are a common audit failure.
Signals we need before calling GitOps production-ready
“Sync healthy” is not enough. We track: merge-to-staging sync time, failed hooks rate, drift count, rollback duration. Argo CD exposes most of this; alerts go to the same on-call rotation that owns the cluster. GitOps is ready when SREs trust the signals—not when the UI is green once.