AWS recently announced a distributed map for Step Functions, a solution for large-scale parallel data processing. Optimized for S3, the new feature of the AWS orchestration service targets interactive and highly parallel serverless data processing workflows. By Renato Losio.
The new distributed map state allows writing Step Functions to coordinate large-scale workloads, iterating over millions of objects on S3, for example, logs, images, or CSV files. While AWS previously supported Step Function’s map state to execute the same processing steps for multiple entries in a dataset, it was limited to 40 parallel iterations.
The distributed map stops reading after 100 million items and supports JSON or CSV files of up to 10GB. Rafal Wilinski, founder of Dynobase, shares a CDK-based PoC of a migrations framework taking advantage of the new feature and comments: Step Functions Distributed Maps are awesome. Combined with DynamoDB Parallel scans, they enable blazingly fast, whole-table data migrations and transformations. Interesting read!
[Read More]