gke/WARN/2021_005

GKE nodes have good disk performance.

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

Disk performance is essential for the proper operation of GKE nodes. If too much IO is done and the disk latency gets too high, system components can start to misbehave. Often the boot disk is a bottleneck because it is used for multiple things: the operating system, docker images, container filesystems (usually including /tmp, etc.), and EmptyDir volumes.

Remediation

You can use the following Cloud Monitoring query to determine the average disk latency for your GKE nodes:

fetch gce_instance
  | {{ metric 'compute.googleapis.com/guest/disk/operation_time' ;
      metric 'compute.googleapis.com/guest/disk/operation_count' }}
  | {within_str}
  | filter metric.device_name = 'sda'
  | group_by [resource.instance_id], .sum()
  | every 1m
  | ratio
  | value(val() > cast_units({SLO_LATENCY_MS}, "ms"))
  | group_by 1d, [ .count_true, .count ]

Further information