gke/WARN/2022_005

NVIDIA GPU device drivers are installed on GKE nodes with GPU

Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong

Description

After adding GPU nodes to the GKE cluster, the NVIDIA’s device drivers should be installed in the nodes. Google provides a DaemonSet that will install for the drivers. This rule detects that if the nvidia-driver-installer DaemonSet is missing using the kubernetes.io/container/uptime metric.

Remediation

Installing NVIDIA GPU device drivers based on the node systems. Refer to the section in Running GPUs for installation instructions for Container-Optimized OS (COS) and Ubuntu nodes.

Further information

Running GPUs