gke/WARN/2021_004

GKE system workloads are running stable.

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

GKE includes some system workloads running in the user-managed nodes which are essential for the correct operation of the cluster. We verify that restart count of containers in one of the system namespaces (kube-system, istio-system, custom-metrics) stayed stable in the last 24 hours.

Remediation

You can use this Cloud Monitoring query to find what system workloads are restarting:

fetch k8s_container
| metric 'kubernetes.io/container/restart_count'
| filter (resource.namespace_name == 'kube-system' ||
          resource.namespace_name == 'istio-system')
| align delta(1h)
| every 1h
| group_by [resource.pod_name], .sum
| filter val() > 0

Further information