Beam 功能 矩阵

Apache Beam提供了一个可移植的API层,用于构建复杂的,可以在多种执行引擎或runners之间执行的数据并行处理流水线。该层的核心概念基于Beam Model (以前称为 Dataflow Model), 并在不同程度上实现了每个Beam runner. 为了帮助说明每个runners的功能,我们创建了以下功能矩阵.

每个独立的功能都已按照相应的 What / Where / When / How 问题分组:

关于 What / Where / When / How 概念的更多细节, 我们建议阅读 O’Reilly Radar上的Streaming 102 帖子.

请注意,将来,我们打算在当前集合之外添加其他表,例如运行时特性(例如至少一次vs一次),性能等.

Beam Model Google Cloud Dataflow Apache Flink Apache Spark Apache Apex Apache Gearpump Apache Hadoop MapReduce JStorm
ParDo
GroupByKey
~
Flatten
Combine
Composite Transforms
~
~
~
~
~
Side Inputs
Source API
~
Splittable DoFn
~
~
Metrics
~
~
~
~
~
~
Stateful Processing
~
~
~
~
Beam Model Google Cloud Dataflow Apache Flink Apache Spark Apache Apex Apache Gearpump Apache Hadoop MapReduce JStorm
Global windows
Fixed windows
Sliding windows
Session windows
Custom windows
Custom merging windows
Timestamp control
Beam Model Google Cloud Dataflow Apache Flink Apache Spark Apache Apex Apache Gearpump Apache Hadoop MapReduce JStorm
Configurable triggering
Event-time triggers
Processing-time triggers
Count triggers
[Meta]data driven triggers
✕ (BEAM-101)
Composite triggers
Allowed lateness
Timers
~
~
~
Beam Model Google Cloud Dataflow Apache Flink Apache Spark Apache Apex Apache Gearpump Apache Hadoop MapReduce JStorm
Discarding
Accumulating
Accumulating & Retracting
✕ (BEAM-91)