gce/ERR/2024_002
Product: Compute Engine
Rule class: ERR - Something that is very likely to be wrong
Description
Checks the performance of the GCE instances in a project - CPU Usage, Memory Usage, Disk Usage and Serial port logs erros. Threshold for CPU Usage, Memory Usage, Disk Usage is 95%.
Remediation
To understand the performance of GCE VM, please review the instance monitoring metrics related to CPU, Memory, Network and Disks.
-
CPU and Memory metrics: Consistently high CPU or memory utilization indicate the need to scale up a VM. If the VM consistently uses greater than 90% of its CPU or memory, change the VM’s machine type to a machine type with more vCPUs or memory.
-
Network metrics: Consistently high outgoing network traffic might indicate the need to change the VM’s machine type to a machine type that has a higher egress bandwidth limit. If you notice high numbers of incoming packets denied by firewalls, visit the Network Intelligence Firewall Insights page in the Google Cloud console to learn more about the origins of denied packets.
-
Disk Metrics: I/O latency is dependent on queue length and I/O size. If the queue length or I/O size for a disk is high, the latency will also be high. If any storage performance metrics indicate disk performance issues, do one or more of the following:
- Review Optimizing persistent disk performance and implement the best practices suggested to improve performance.
- Resize the persistent disks to increase the per-disk IOPS and throughput limits. Persistent disks do not have any reserved, unusable capacity, so you can use the full disk without performance degradation.
- Change the disk type to a disk type that offers higher performance. For more information, see Configure disks to meet performance requirements.