dataflow/ERR/2024_002
Product: Dataflow
Rule class: ERR - Something that is very likely to be wrong
Description
Dataflow streaming jobs may fail due to the following error message:
Error message from worker: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$KeyCommitTooLargeException: Commit request for stage P59 and key 7460 has size 1422763350 which is more than the limit of 1073741824. This may be caused by grouping a very large amount of data in a single window without using Combine, or by producing a large amount of data from a single input element.
You can search this in Logs Explorer for such jobs with the below mentioned Logging Query:
"""
resource.type="dataflow_step"
resource.labels.job_id="%dataflowJobID%"
"KeyCommitTooLargeException" OR "This may be caused by grouping a very large amount of data in a single window without using Combine, or by producing a large amount of data from a single input element"
severity>="WARNING"
"""
Remediation
This error occurs in streaming scenarios if a very large amount of data is grouped without using a Combine transform, or if a large amount of data is produced from a single input element.
To reduce the possibility of encountering this error, use the following strategies:
-
Ensure that processing a single element cannot result in outputs or state modifications exceeding the limit.
-
If multiple elements were grouped by a key, consider increasing the key space to reduce the elements grouped per key.
-
If elements for a key are emitted at a high frequency over a short time, that might result in many GB of events for that key in windows.
-
Rewrite the pipeline to detect keys like this and only emit an output indicating the key was frequently present in that window.
-
Use sublinear space Combine transforms for commutative and associate operations. Don’t use a combiner if it doesn’t reduce space. For example, combiner for strings that just appends strings together is worse than not using combiner.