datafusion/ERR/2024_001

Datafusion delete operation not failing.

Product: Cloud Data Fusion
Rule class: ERR - Something that is very likely to be wrong

Description

During the instance deletion process there are cases wherein a networking resource (i.e route) in the tenant project might not get deleted due to which the process gets stalled in Deleting, and other reasons include missing IAM roles in Google managed datafusion serviceAccount.

Remediation

  • Instance stuck in deleting, instance deletion can fail due to missing IAM roles on the Cloud Data Fusion P4 service account which is a Google-managed service account, such as Cloud Data Fusion API Service Agent (roles/datafusion.serviceAgent) IAM role.

  • Some other Common problems leading to instances deletion failure

    • Failure to find expected resources : In this case, clean up the resources manually from the GKE cluster and then re-attempt the delete again with REST API.
    • PVCs stuck in terminating state : This is caused by leftover pods in GKE that is using the PVC, In this case, manually delete the cluster and PVCs and then re-attempt the delete again with REST API.
    • Pods get into crashloop due to secret missing : In this case manually delete the cluster

Further information