image
image
image

Apache Samza

image

This processing framework is very close in nature to Apache Kafka messaging system. Kafka can be used for most stream systems, and Samza is made so that it specifically takes advantage of Kafka’s guarantees and architecture. Kafka is there to provide Samza with fault tolerance, state storage, and buffering.

For resource negotiation, Samza uses YARN. This means that there will have to be a Hadoop cluster, but this also means that there will be the features of YARN present.

Samza uses the semantics of Kafka to define how their streams will be handled. Kafka uses these types of concepts when they are dealing with data:

Consumers – these are the parts that read from a Kafka topic.

Producer – these are the components that write to a Kafka topic.

Brokers – these are the nodes that make of the clusters.

Partitions – the topics are distributed evenly among nodes by dividing the incoming messages into partitions.

Topics – all of the data that enters into the Kafka system is known as topics.