Drift Detection
8 min
overview kubegrade drift detection identifies mismatches between your live cluster state and intended state (git/iac/policy baselines) so teams can correct drift before it becomes an outage or audit problem drift sources (cluster vs iac vs git) cluster vs iac drift live cluster resources differ from terraform/helm/kustomize defined state cluster vs git drift git tracked manifests/configs no longer match what is running in the cluster policy baseline drift cluster/workload configuration diverges from internal standards or approved baselines cross environment drift staging/prod environments diverge unintentionally over time drift policies (what to flag / severity) what to flag (examples) manual changes outside gitops resource limits/requests changed in cluster only networkpolicy differences ingress/service selector mismatches deprecated/unsupported api usage missing labels/annotations required by policy image tag/version deviations rbac drift severity model (recommended) critical security/compliance risk, production impact likely high likely operational risk or audit issue medium inconsistency or future risk low informational / cosmetic drift drift remediation via pr kubegrade should support turning drift findings into prs that restore desired state remediation modes reconcile cluster to git/iac (preferred in gitops environments) update git/iac to reflect approved runtime changes (controlled exceptions) open review only recommendation without pr pr contents exact diff to restore alignment drift classification and severity evidence of where mismatch was detected impact notes (if applicable) exclusions and suppression rules not all drift should trigger alerts common exclusions auto generated labels/annotations runtime status fields ephemeral resources known provider managed mutations approved temporary overrides (time bound) suppression best practices scope suppressions narrowly (resource/namespace/path) add expiration date require reason/comment track suppressions in audit logs