gke/WARN/2023_004

A Node Pool doesn’t have too low maxPodsPerNode number

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

Modern GKE clusters could run multiple system DaemonSets, and enabling a GKE feature could add another DaemonSet or two. 7+ DaemonSets is the norm for an average GKE cluster. Keeping a small reserve (1-2 slots) for non-DaemonSet pods requires close attention, as enabling a new GKE feature or adding a new DeamonSet could prevent normal workload scheduling as all the available slots could be occupied by system or custom DaemonSet pods. This could also cause the Cluster Autoscaler to misbehave: it could add new nodes and remove them promptly as empty ones, because no pods except DaemonSet ones can be scheduled on the nodes.

Note that with new versions of GKE, some features could be enabled by default, bringing a DeamonSet along.

Here is an incomplete list of system GKE DaemonSets that are added by a corresponding GKE-feature:

# DeamonSet name GKE Feature
1 kube-proxy non-DPv2 clusters
2 konnectivity-agent some GKE <v1.27 clusters, all GKE v1.27+ clusters
3 fluentbit-gke GKE Logging
4 gke-metrics-agent GKE Monitoring
5 pdcsi-node CSI-enabled clusters == (almost) all v1.18+ clusters
6 collector GMP-enabled clusters, enabled by default on v1.25 (Autopilot) and v1.27 (Standard)clusters
7 netd Workload Identity, Infra-node visibility, other
8 anetd DPv2 clusters
9 calico-node Calico Network Policies
10 container-watcher Container Threat Detection
11 gke-metadata-server Workload Identity
12 nvidia-gpu-device-plugin GKE TPUs/GPUs
13 nccl-fastsocket-installer nVidia NCCL Fast Socket

maxPodsPerNode >= 16 should be a safer option.

Remediation

Recreate the Node Pool with higher (>=16) maxPodsPerNode. Alternatively, you can create a new Node Pool and migrate your workloads to the new one.

Further information