gke/WARN/2023_004
maxPodsPerNode
numberProduct: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong
Description
Modern GKE clusters could run multiple system DaemonSets, and enabling a GKE feature could add another DaemonSet or two. 7+ DaemonSets is the norm for an average GKE cluster. Keeping a small reserve (1-2 slots) for non-DaemonSet pods requires close attention, as enabling a new GKE feature or adding a new DeamonSet could prevent normal workload scheduling as all the available slots could be occupied by system or custom DaemonSet pods. This could also cause the Cluster Autoscaler to misbehave: it could add new nodes and remove them promptly as empty ones, because no pods except DaemonSet ones can be scheduled on the nodes.
Note that with new versions of GKE, some features could be enabled by default, bringing a DeamonSet along.
Here is an incomplete list of system GKE DaemonSets that are added by a corresponding GKE-feature:
# | DeamonSet name | GKE Feature |
---|---|---|
1 | kube-proxy | non-DPv2 clusters |
2 | konnectivity-agent | some GKE <v1.27 clusters, all GKE v1.27+ clusters |
3 | fluentbit-gke | GKE Logging |
4 | gke-metrics-agent | GKE Monitoring |
5 | pdcsi-node | CSI-enabled clusters == (almost) all v1.18+ clusters |
6 | collector | GMP-enabled clusters, enabled by default on v1.25 (Autopilot) and v1.27 (Standard)clusters |
7 | netd | Workload Identity, Infra-node visibility, other |
8 | anetd | DPv2 clusters |
9 | calico-node | Calico Network Policies |
10 | container-watcher | Container Threat Detection |
11 | gke-metadata-server | Workload Identity |
12 | nvidia-gpu-device-plugin | GKE TPUs/GPUs |
13 | nccl-fastsocket-installer | nVidia NCCL Fast Socket |
maxPodsPerNode
>= 16 should be a safer option.
Remediation
Recreate the Node Pool with higher (>=16) maxPodsPerNode
. Alternatively, you
can create a new Node Pool and migrate your workloads to the new one.