dataflow/ERR/2024_003

Streaming Dataflow jobs are not using WRITE_TRUNCATE when working with unbounded PCollections.

Product: Dataflow
Rule class: ERR - Something that is very likely to be wrong

Description

Dataflow jobs when using WRITE_TRUNCATE with unbounded PCollections sources would return the below warning:

WriteDisposition.WRITE_TRUNCATE is not supported for an unbounded PCollection

Remediation

BigQueryIO doesn’t support streaming query (or an unbounded source) to BigQuery with the “Overwrite table” (WriteDisposition.WRITE_TRUNCATE) option. If the user selects a Pub/Sub topic as the source, in the Create Dataflow job panel under the path Destination > BigQuery > Write disposition, it will not show Overwrite table option but only the options Write if empty and Append to table.

This is an intended behavior to prevent the exception mentioned above. If you are manually setting this parameter in your code base, the above warning is thrown, which will lead to stuckness on the job.

Further information