gke/WARN/2022_005
NVIDIA GPU device drivers are installed on GKE nodes with GPU
Product: Google Kubernetes Engine
Rule class: WARN - Something that is possibly wrong
Description
After adding GPU nodes to the GKE cluster, the NVIDIA’s device drivers should be installed in the nodes. Google provides a DaemonSet that will install for the drivers. This rule detects that if the nvidia-driver-installer
DaemonSet is missing using the kubernetes.io/container/uptime
metric.
Remediation
Installing NVIDIA GPU device drivers based on the node systems. Refer to the section in Running GPUs for installation instructions for Container-Optimized OS (COS) and Ubuntu nodes.