gke/Node Auto Repair

Provides the reason why a Node was auto-repaired

Product: Google Kubernetes Engine Kind: Debugging Tree

Description

This runbook checks if:

  • Node auto-repair is enabled on the cluster
  • Nodes was repaired because it was in NotReady status for more than 10 minutes
  • Nodes was repaired because it had disk pressure
  • Nodes was repaired because of unallocatable GPUs
  • Nodes was repaired because of unallocatable TPUs

Executing this runbook

gcpdiag runbook gke/node-auto-repair \
  -p project_id=value \
  -p name=value \
  -p node=value \
  -p location=value

Parameters

Name Required Default Type Help
project_id True None str The ID of the project hosting the GKE Cluster
name False None str The name of the GKE cluster, to limit search only for this cluster
node True None str The node name with issues.
location False None str The zone of the GKE node

Get help on available commands

gcpdiag runbook --help

Potential Steps