datafusion/ERR/2022_008

Cloud Data Fusion SA has Service Account User permissions on the Dataproc SA

Product: Cloud Data Fusion
Rule class: ERR - Something that is very likely to be wrong

Description

Running pipelines on your Dataproc clusters that reside in customer project need access to your resources so that they can act on your behalf. Whether you use a user-managed service account, or the default Compute Engine service account on the virtual machines in a cluster, you must grant the Service Account User role to Cloud Data Fusion. Otherwise, Cloud Data Fusion cannot provision virtual machines in a Dataproc cluster.

PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account ‘[service-account-name]’

Remediation

In order to solve Dataproc operation failure error you can follow one of the following approaches:

  1. Service Account level: Grant the Service Account User role for Cloud Data Fusion Service Account on the Data Proc Service Account

  2. Project level: Grant the Service Account User role for Cloud Data Fusion Service Agent on the Project. This would lead to an inheritance at the Service Account Level.

For example, this can be done using the GCP Console or by running the following gcloud tool commands :

gcloud iam service-accounts add-iam-policy-binding SERVICE_ACCOUNT --member='serviceAccount:service-PROJECT_ID@gcp-sa-datafusion.iam.gserviceaccount.com' --role='roles/iam.serviceAccountUser'

where SERVICE_ACCOUNT could be either Compute Engine default service account or a custom Service Account

gcloud projects add-iam-policy-binding PROJECT_ID --member='serviceAccount:service-PROJECT_ID@gcp-sa-datafusion.iam.gserviceaccount.com' --role='roles/iam.serviceAccountUser'

Further information