dataproc/ERR/2023_002

Checking if any orphaned YARN application has been killed by dataproc agent in the cluster.

Product: Cloud Dataproc
Rule class: ERR - Something that is very likely to be wrong

Description

Dataproc considers a yarn app to be orphaned if the job driver that submitted the yarn app has exited. By default, dataproc:dataproc.yarn.orphaned-app-termination.enable is set to True, which means Dataproc agent is enabled to kill the orphaned yarn app.

Remediation

  • You can see the log from dataproc agent if the yarn app became orphaned and killed by dataproc agent.

    Log query example :

    resource.type=“cloud_dataproc_cluster”
    resource.labels.cluster_uuid="<Datproc_cluster_uuid>"
    “<YARN_app_id>”
    logName: “projects//logs/google.dataproc.agent”

  • If you use Spark cluster mode (spark.submit.deployMode=cluster) with spark.yarn.submit.waitAppCompletion=false, then you should also set dataproc:dataproc.yarn.orphaned-app-termination.enable=false