composer/WARN/2023_006

Airflow schedulers are healthy for the last hour

Product: Cloud Composer
Rule class: WARN - Something that is possibly wrong

Description

Airflow schedulers report heartbeat signals every predefined interval called scheduler_heartbeat_sec (default: 5 seconds). If any heartbeats are received within the threshold time (default: 30 seconds), the Scheduler heartbeat from the monitoring dashboard is marked as Green, which means healthy. Otherwise the status is unhealthy. Note that if your environment has more than one scheduler, then the status is healthy as long as at least one of schedulers is responding.

Remediation

Identify if the issue happens at DAG parse time or while processing tasks at execution time. For more information about symptoms, see Troubleshooting Airflow scheduler issues.

For issues at DAG parse time, inspect DAG Processor logs and increase parameters related to DAG parsing (dagbag-import-timeout, dag-file-processor-timeout) if there are DAGs not parsed properly. Otherwise, fix or remove DAGs that cause problems to the DAG processor.

For issues at execution time, make sure that airflow-scheduler pods of the GKE cluster are not overloaded. If you can find singns of being overloaded like CPU usage hitting limit, restarting due to OOMKilled or ephemeral storage usage is reaching its limit, adjust scheduler scale and performance parameters properly.

Further information