GKE

All steps available in gke


gke/Ca Failed To Evict Pods

Check for “scale.down.error.failed.to.evict.pods” log entries

gke/Ca Instance Timeout

Check for “scale.up.error.waiting.for.instances.timeout” log entries

gke/Ca Ip Space Exhausted

Check for “scale.up.error.ip.space.exhausted” log entries

gke/Ca Min Size Reached

Check for “no.scale.down.node.node.group.min.size.reached” log entries

gke/Ca Out Of Resources

Check for “scale.up.error.out.of.resources” log entries

gke/Ca Quota Exceeded

Check for “scale.up.error.quota.exceeded” log entries

gke/Ca Service Account Deleted

Check for “scale.up.error.service.account.deleted” log entries

gke/Check Config Map

This will confirm confif map is present as that llow user to make changes on ip-agent.

gke/Check Daemon Set

On GKE for ip-masq can be deployed or automatically in cluster.

gke/Check Destination Ip

GKE is expected not to IP masquerade. If needed then it should be added on nonMasqueradeCIDRs.

gke/Check Node Ip

When Node IP is present under non-masquerade list, it will allow node IP to not get natted .

gke/Check Pod Ip

GKE preserves the Pod IP addresses sent to destinations in the nonMasqueradeCIDRs list.

gke/Cluster Autoscaler End

Finalizes the diagnostics process for Cluster Autoscaler.

gke/Cluster Autoscaler Start

Initiates diagnostics for Cluster Autoscaler.

gke/Cluster Level Logging Enabled

Verifies that logging is enabled at the GKE cluster level.

gke/Cluster Version

Check for cluster version

gke/Gke Ip Masq Standard End

Concludes the the diagnostics process.

gke/Gke Ip Masq Standard Start

Check if the project ID, GKE cluster and its location is valid.

gke/Image Connection Timeout

The connection to Google APIs is timing out

gke/Image Connection Timeout Restricted Private

The connection to restricted.googleapis.com or private.googleapis.com is timing out

gke/Image Dns Issue

Node DNS sever cannot resolve the IP of the repository

gke/Image Forbidden

Image cannot be pulled, insufficiente permissions

gke/Image Not Found

Check for Image not found log entries

gke/Image Pull End

Finalizes the diagnostics process for GKE Image Pull runbbok.

gke/Image Pull Start

Initiates diagnostics for Image pull runbook.

gke/Ip Exhaustion End

Concludes the CLuster IP Exhaustion diagnostics process.

gke/Ip Exhaustion Gateway

Check for IP Exhaustion issue and the cluster configuration type.

gke/Ip Exhaustion Start

Start IP Exhaustion Checks

gke/Live Migration

Checks if the node was unavailable due to a live migration event.

gke/Logging Api Enabled

Verifies if the Cloud Logging API is enabled for the project hosting the GKE cluster.

gke/Logging Write Api Quota Exceeded

Verifies that Cloud Logging API write quotas have not been exceeded.

gke/Logs End

Finalizes the ‘GKE logs’ diagnostic process.

gke/Logs Start

Initiates diagnostics for GKE Clusters.

gke/Node Auto Repair End

Finalizes the diagnostics process for Node AutoRepair.

gke/Node Auto Repair Start

Check inputs and verify if there actually was a repair event

gke/Node Bootstrapping End

Finalizes the diagnostics process for GKE Node Bootstrapping.

gke/Node Bootstrapping Start

Initiates diagnostics for Node Bootstrapping.

gke/Node Disk Full

Checks if node disks are full.

gke/Node Insert Check

Check for any errors during instances.insert method

gke/Node Ip Range Exhaustion

Check Node IP Range Exhaustion and offer remediation.

gke/Node Not Ready

Checks if nodes have been in NotReady status for an extended period (e.g., 10 minutes).

gke/Node Pool Cloud Logging Access Scope

Verifies that GKE node pools have the required Cloud Logging access scopes.

gke/Node Pool Upgrade

Checks if the node was removed by Cluster Upgrade Operation.

gke/Node Registration Success

Verify Node Registration Checker output

gke/Node Removed By Autoscaler

Checks if the node was removed by Cluster Autoscaler.

gke/Node Unavailability End

Finalizes the diagnostics process for Node Unavailability.

gke/Node Unavailability Start

Check inputs and verify if the node was unavailable

gke/Nodeproblem

This will confirm if there is any VPC flow logs to destination IP.

gke/Pod Ip Range Exhaustion

Check Pod IP Range Exhaustion and offer remediation.

gke/Preemption Condition

Checks if the node was preempted.

gke/Resource Quota Exceeded

Verifies that Kubernetes resource quotas have been exceeded or not.

gke/Resource Quotas End

Finalizes the diagnostics process for Resource Quotas.

gke/Resource Quotas Start

Initiates diagnostics for Resource Quotas.

gke/Service Account Logging Permission

Verifies the service accounts associated with node pools have ‘logging.logWriter’ permissions.

gke/Standard Ip Masq End

Concludes the the diagnostics process.

gke/Unallocatable Gpu

Checks GPU allocation

gke/Unallocatable Tpu

Checks TPU allocation