dataflow/ERR/2023_012

Dataflow job writing to spanner did not fail due to OOM.

Product: Dataflow
Rule class: ERR - Something that is very likely to be wrong

Description

The Dataflow job writing to spanner can fail in case of out of memory errors.

To write data to spanner, The SpannerIO package in Beam’s Java sdk provides a write transform. It uses SpannerIO.write() transform to execute a collection of input row mutations. The Dataflow connector groups mutations into batches for efficiency. Given a PCollection of Mutations – SpannerIO.Write groups, sorts and batches the mutations on the primary key and applies the mutations on Spanner.

SpannerIO uses local memory for creating batches of sorted mutations which can lead to high memory consumption

Remediation

There are several options to address high memory consumption by SpannerIO.write():

increase the machine type/memory machine types
reducing the number_of_worker_harness_threads
alter the SpannerIO.write() parameters to reduce the amount of data that needs to be stored in memory
disable Sorting by setting .withGroupingFactor(1) (negatively affects performance of SpannerIO.write())
disable Sorting and Batching by setting .withMaxNumMutations(0) (negatively affects performance of SpannerIO.write())

The latter can be done using the SpannerIO.write() parameters. For example: SpannerIO.write().withBatchSizeBytes (100_000).withMaxNumMutations(500).withGroupingFactor(100) will use approx 1/10th of the memory.

Further information

SpannerIO code javadoc. Dataflow connector [BeamIO connectors] (https://beam.apache.org/documentation/io/connectors/)