dataproc/Spark Job Failures
Provides a comprehensive analysis of common issues which affects Dataproc Spark job failures.
Product: Cloud Dataproc Kind: Debugging Tree
Description
This runbook focuses on a range of potential problems for Dataproc Spark jobs on Google Cloud Platform. By conducting a series of checks, the runbook aims to pinpoint the root cause of Spark job failures.
The following areas are examined:
- Cluster version supportability: Evaluates if the job was run on a supported cluster image version.
- Permissions: Checks for permission related issues on the cluster and GCS bucket level.
- OOM: Checks Out-Of-Memory issues for the Spark job on master or worker nodes.
- Logs: Check other logs related to shuffle failures, broken pipe, YARN runtime exception, import failures.
- Throttling: Checks if the job was throttled and provides the exact reason for it.
- GCS Connector: Evaluates possible issues with the GCS Connector.
- BigQuery Connector: Evaluates possible issues with BigQuery Connector, such as dependency version conflicts.
Executing this runbook
gcpdiag runbook dataproc/spark-job-failures \
-p project_id=value \
-p job_id=value \
-p region=value \
-p zone=value \
-p service_account=value \
-p cross_project=value \
-p stackdriver=value
Parameters
Name | Required | Default | Type | Help |
---|---|---|---|---|
project_id |
True | None | str | The Project ID of the resource under investigation |
job_id |
True | None | str | The Job ID of the resource under investigation |
region |
True | None | str | Dataproc job/cluster Region |
zone |
False | None | str | Dataproc cluster Zone |
service_account |
False | None | str | Dataproc cluster Service Account used to create the resource |
cross_project |
False | None | str | Cross Project ID, where service account is located if it is not in the same project as the Dataproc cluster |
stackdriver |
False | False | str | Checks if stackdriver logging is enabled for further troubleshooting |
Get help on available commands
gcpdiag runbook --help