gke/Ca Failed To Evict Pods
Check for “scale.down.error.failed.to.evict.pods” log entries
All steps available in gke
Check for “scale.down.error.failed.to.evict.pods” log entries
Check for “scale.up.error.waiting.for.instances.timeout” log entries
Check for “scale.up.error.ip.space.exhausted” log entries
Check for “no.scale.down.node.node.group.min.size.reached” log entries
Check for “scale.up.error.out.of.resources” log entries
Check for “scale.up.error.quota.exceeded” log entries
Check for “scale.up.error.service.account.deleted” log entries
This will confirm confif map is present as that llow user to make changes on ip-agent.
On GKE for ip-masq can be deployed or automatically in cluster.
GKE is expected not to IP masquerade. If needed then it should be added on nonMasqueradeCIDRs.
When Node IP is present under non-masquerade list, it will allow node IP to not get natted .
GKE preserves the Pod IP addresses sent to destinations in the nonMasqueradeCIDRs list.
Finalizes the diagnostics process for Cluster Autoscaler
.
Initiates diagnostics for Cluster Autoscaler.
Verifies that logging is enabled at the GKE cluster level.
Check for cluster version
Concludes the the diagnostics process.
Check if the project ID, GKE cluster and its location is valid.
The connection to Google APIs is timing out
The connection to restricted.googleapis.com or private.googleapis.com is timing out
Node DNS sever cannot resolve the IP of the repository
Image cannot be pulled, insufficiente permissions
Check for Image not found log entries
Finalizes the diagnostics process for GKE Image Pull runbbok
.
Initiates diagnostics for Image pull runbook.
Concludes the CLuster IP Exhaustion diagnostics process.
Check for IP Exhaustion issue and the cluster configuration type.
Start IP Exhaustion Checks
Checks if the node was unavailable due to a live migration event.
Verifies if the Cloud Logging API is enabled for the project hosting the GKE cluster.
Verifies that Cloud Logging API write quotas have not been exceeded.
Finalizes the ‘GKE logs’ diagnostic process.
Initiates diagnostics for GKE Clusters.
Finalizes the diagnostics process for Node AutoRepair
.
Check inputs and verify if there actually was a repair event
Finalizes the diagnostics process for GKE Node Bootstrapping
.
Initiates diagnostics for Node Bootstrapping.
Checks if node disks are full.
Check for any errors during instances.insert method
Check Node IP Range Exhaustion and offer remediation.
Checks if nodes have been in NotReady status for an extended period (e.g., 10 minutes).
Verifies that GKE node pools have the required Cloud Logging access scopes.
Checks if the node was removed by Cluster Upgrade Operation.
Verify Node Registration Checker output
Checks if the node was removed by Cluster Autoscaler.
Finalizes the diagnostics process for Node Unavailability
.
Check inputs and verify if the node was unavailable
This will confirm if there is any VPC flow logs to destination IP.
Check Pod IP Range Exhaustion and offer remediation.
Checks if the node was preempted.
Verifies that Kubernetes resource quotas have been exceeded or not.
Finalizes the diagnostics process for Resource Quotas
.
Initiates diagnostics for Resource Quotas.
Verifies the service accounts associated with node pools have ‘logging.logWriter’ permissions.
Concludes the the diagnostics process.
Checks GPU allocation
Checks TPU allocation