gce/ERR/2024_004

Verify Ops Agent is installed on GCE VMs and is sending logs and metrics.

Product: Compute Engine
Rule class: ERR - Something that is very likely to be wrong

Description

This Rule verifies that VMs in the project have the Ops Agent installed and that the agent is successfully sending both logs and metrics to Google Cloud. The rule fails if the agent isn’t transmitting these data streams.

Seeing logs and metrics from your GCE VMs in Logs Explorer and Metrics Explorer, respectively, confirms successful Ops Agent telemetry transmission. Conversely, missing logs or metrics on either dashboard indicates a transmission failure with the Ops Agent. This check programatically analyzes both log and metric transmission and reports the results individually in the final rule report.

This Rule relies on the OS Config API to check Ops Agent installation on GCE VMs. It’s recommended to enable VM Manager which enables OS Config API automatically to guarantee the rule’s ability to detect the ops agent installation. Without verifying the installation, the rule can’t proceed to check for log and metric transmission.

Top Reasons Why Ops Agent Fails to Transmit Logs and Metrics:

  1. Missing VM Access Scopes: The VM should enable both “logging.write” and “monitoring.write” scopes. Follow this instruction to update your VM access scopes.
  2. Missing Service Account IAM Roles: The Service Account associated with the VM requires both “roles/monitoring.metricWriter” and “roles/logging.logWriter”. Read here for more information.
  3. GCP API Not Enabled: Ops Agent requires both Cloud Monitoring API and Cloud Logging API enabled on the project.

Remediation

Why isn’t the Ops Agent transmitting logs and metrics? Please run the Agent health check: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/troubleshoot-find-info#start-checks to find out, and look up the error code table: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/troubleshoot-find-info#health-checks to locate the corresponding fix.

To install the latest version of Ops Agent, please follow: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/installation#install-latest-version.

To troubleshoot Ops Agent installation failure, please follow: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/troubleshoot-install-startup#install-failed.

Further information