gke/WARN/2023_004

A Node Pool doesn’t have too low maxPodsPerNode number

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

Modern GKE clusters could run multiple system DaemonSets, and enabling a GKE feature could add another DaemonSet or two. 7+ DaemonSets is the norm for an average GKE cluster. Keeping a small reserve (1-2 slots) for non-DaemonSet pods requires close attention, as enabling a new GKE feature or adding a new DeamonSet could prevent normal workload scheduling as all the available slots could be occupied by system or custom DaemonSet pods. This could also cause the Cluster Autoscaler to misbehave: it could add new nodes and remove them promptly as empty ones, because no pods except DaemonSet ones can be scheduled on the nodes.

Note that with new versions of GKE, some features could be enabled by default, bringing a DeamonSet along.

Here is an incomplete list of system GKE DaemonSets that are added by a corresponding GKE-feature:

#	DeamonSet name	GKE Feature
1	kube-proxy	non-DPv2 clusters
2	konnectivity-agent	some GKE <v1.27 clusters, all GKE v1.27+ clusters
3	fluentbit-gke	GKE Logging
4	gke-metrics-agent	GKE Monitoring
5	pdcsi-node	CSI-enabled clusters == (almost) all v1.18+ clusters
6	collector	GMP-enabled clusters, enabled by default on v1.25 (Autopilot) and v1.27 (Standard)clusters
7	netd	Workload Identity, Infra-node visibility, other
8	anetd	DPv2 clusters
9	calico-node	Calico Network Policies
10	container-watcher	Container Threat Detection
11	gke-metadata-server	Workload Identity
12	nvidia-gpu-device-plugin	GKE TPUs/GPUs
13	nccl-fastsocket-installer	nVidia NCCL Fast Socket

maxPodsPerNode >= 16 should be a safer option.

Remediation

Recreate the Node Pool with higher (>=16) maxPodsPerNode. Alternatively, you can create a new Node Pool and migrate your workloads to the new one.

Further information

Migrate your workloads to another Node Pool