gke/WARN/2024_001

GKE Node Auto Provisioning scales nodes to match workload demands.

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

If a GKE cluster has Node Auto Provisioning (NAP) enabled, resource limits are configured to support workload scaling. Increased demand triggers successful node creation, ensuring application continuity.

If NAP resource limits (CPU, memory) are configured too low, the autoscaler may be unable to add new nodes during high demand. This could potentially cause application disruptions. To prevent this, ensure NAP resource limits are set appropriately or consider manually scaling node pools as needed.

Remediation

Use the following search filter to identify the issue in Cloud Logging.

Project: Your Project Name (<project_id>)
Cluster: Your Cluster’s Name (<cluster_name>)
Filter:

resource.type="k8s_cluster"
resource.labels.cluster_name="<cluster_name>"
logName="projects/<project_id>/logs/container.googleapis.com%2Fcluster-autoscaler-visibility"
jsonPayload.noDecisionStatus.noScaleUp.unhandledPodGroups.napFailureReasons.messageId="no.scale.up.nap.pod.zonal.resources.exceeded"

To fix the issue, either increase your NAP resource limits or manually scale your node pools.

Increase NAP resource limits:

Review your cluster’s typical workload patterns to estimate the appropriate new resource limits for NAP.
Use the Google Cloud Console or gcloud commands to modify the NAP settings, ensuring the new limits provide sufficient headroom for your application’s workload spikes.

Manually scale your node pools:

If you cannot confidently increase NAP resource limits or need an immediate workaround, manually add nodes to the affected node pools. This is a temporary solution, and you should still investigate why the cluster needs additional nodes.

Further information

Read more about configuring Node Auto Provisioning in the documentation.