gke/WARN/2021_004
GKE system workloads are running stable.
Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong
Description
GKE includes some system workloads running in the user-managed nodes which are essential for the correct operation of the cluster. We verify that restart count of containers in one of the system namespaces (kube-system, istio-system, custom-metrics) stayed stable in the last 24 hours.
Remediation
You can use this Cloud Monitoring query to find what system workloads are restarting:
fetch k8s_container
| metric 'kubernetes.io/container/restart_count'
| filter (resource.namespace_name == 'kube-system' ||
resource.namespace_name == 'istio-system')
| align delta(1h)
| every 1h
| group_by [resource.pod_name], .sum
| filter val() > 0