dataflow/ERR/2024_002

Dataflow job is not returning KeyCommitTooLargeException errors.

Product: Dataflow
Rule class: ERR - Something that is very likely to be wrong

Description

Dataflow streaming jobs may fail due to the following error message:

Error message from worker: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$KeyCommitTooLargeException: Commit request for stage P59 and key 7460 has size 1422763350 which is more than the limit of 1073741824. This may be caused by grouping a very large amount of data in a single window without using Combine, or by producing a large amount of data from a single input element.

You can search this in Logs Explorer for such jobs with the below mentioned Logging Query:

  """
  resource.type="dataflow_step"
  resource.labels.job_id="%dataflowJobID%"
  "KeyCommitTooLargeException" OR "This may be caused by grouping a very large amount of data in a single window without using Combine, or by producing a large amount of data from a single input element"
  severity>="WARNING"
  """

Remediation

This error occurs in streaming scenarios if a very large amount of data is grouped without using a Combine transform, or if a large amount of data is produced from a single input element.

To reduce the possibility of encountering this error, use the following strategies:

  1. Ensure that processing a single element cannot result in outputs or state modifications exceeding the limit.

  2. If multiple elements were grouped by a key, consider increasing the key space to reduce the elements grouped per key.

  3. If elements for a key are emitted at a high frequency over a short time, that might result in many GB of events for that key in windows.

  4. Rewrite the pipeline to detect keys like this and only emit an output indicating the key was frequently present in that window.

  5. Use sublinear space Combine transforms for commutative and associate operations. Don’t use a combiner if it doesn’t reduce space. For example, combiner for strings that just appends strings together is worse than not using combiner.

Further information