dataproc/Spark Job Failures

Provides a comprehensive analysis of common issues which affects Dataproc Spark job failures.

Product: Cloud Dataproc Kind: Debugging Tree

Description

This runbook focuses on a range of potential problems for Dataproc Spark jobs on Google Cloud Platform. By conducting a series of checks, the runbook aims to pinpoint the root cause of Spark job failures.

The following areas are examined:

Cluster version supportability: Evaluates if the job was run on a supported cluster image version.
Permissions: Checks for permission related issues on the cluster and GCS bucket level.
OOM: Checks Out-Of-Memory issues for the Spark job on master or worker nodes.
Logs: Check other logs related to shuffle failures, broken pipe, YARN runtime exception, import failures.
Throttling: Checks if the job was throttled and provides the exact reason for it.
GCS Connector: Evaluates possible issues with the GCS Connector.
BigQuery Connector: Evaluates possible issues with BigQuery Connector, such as dependency version conflicts.

Executing this runbook

gcpdiag runbook dataproc/spark-job-failures \
  -p project_id=value \
  -p job_id=value \
  -p dataproc_job_id=value \
  -p region=value \
  -p zone=value \
  -p service_account=value \
  -p cross_project=value \
  -p stackdriver=value

Parameters

Name	Required	Default	Type	Help
`project_id`	True	None	str	The Project ID of the resource under investigation
`job_id`	False	None	str	The Job ID of the resource under investigation
`dataproc_job_id`	True	None	str	The Job ID of the resource under investigation
`region`	True	None	str	Dataproc job/cluster Region
`zone`	False	None	str	Dataproc cluster Zone
`service_account`	False	None	str	Dataproc cluster Service Account used to create the resource
`cross_project`	False	None	str	Cross Project ID, where service account is located if it is not in the same project as the Dataproc cluster
`stackdriver`	False	False	str	Checks if stackdriver logging is enabled for further troubleshooting

Get help on available commands

gcpdiag runbook --help

dataproc/Spark Job Failures

Description

Executing this runbook

Parameters

Potential Steps