dataproc/ERR/2023_002
Product: Cloud Dataproc
Rule class: ERR - Something that is very likely to be wrong
Description
Dataproc considers a yarn app to be orphaned if the job driver that submitted the yarn app has exited. By default, dataproc:dataproc.yarn.orphaned-app-termination.enable is set to True, which means Dataproc agent is enabled to kill the orphaned yarn app.
Remediation
-
You can see the log from dataproc agent if the yarn app became orphaned and killed by dataproc agent.
Log query example :
resource.type=“cloud_dataproc_cluster”
resource.labels.cluster_uuid="<Datproc_cluster_uuid>"
“<YARN_app_id>”
logName: “projects//logs/google.dataproc.agent” -
If you use Spark cluster mode (spark.submit.deployMode=cluster) with spark.yarn.submit.waitAppCompletion=false, then you should also set dataproc:dataproc.yarn.orphaned-app-termination.enable=false