Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robert van Mölken

Ten reasons to choose Apache Pulsar over
Apache Kafka for Event Sourcing (CQRS)

Who am I?
Ten reasons to choose Apache Pulsar 2
Robert van Mölken
Solution Architect / Developer
Blockchain / IoT / Cloud Apps
Groundbreaker Ambassador
Author two books including:
Blockchain across Oracle
Linkedin: linkedin.com/in/rvmolken
Blog: technology.vanmolken.nl
Twitter: @robertvanmolken

Ten reasons to choose Apache Pulsar over Apache Kafka
For Event Sourcing (CQRS)
Apache Pulsar for Event Sourcing 3
TOPICS TO COVER
What is Event Sourcing and CQRS
Common problems with Kafka-based projects
Introducing our ‘friend’, Pulsar
Reasons why Pulsar should be your #1 pick

• Event sourcing uses event-centric
approach to persistence business
object states.
• A business object is persisted by
storing a sequence of state
changing events.
• Whenever an object’s state
changes, a new event is appended
to the sequence of events.
• An app stores an object by
persisting its state changing events.
Each event would contain sufficient
data to reconstruct the object’s state
by many different applications.
Event Sourcing

• Architectural design pattern for
developing software whereby writing
and reading data are separate
responsibilities
• The bits of the system that are
responsible for changing the system’s
state are physically separated from the
bits of the system that manage views
• In constast, a traditional CRUD
architecture is implemented by the
same component
• It has the responsibility for both reads
(the views) and the writes (the state
changes)
What is Event Sourcing and CQRS (Cont.)
CQRS
Command and Query Responsibility Segregation

Why use Event Sourcing and CQRS?

Source: https://jack-vanlightly.com/sketches/2018/10/2/kafka-vs-pulsar-rebalancing-sketch

Common problems with Kafka-based projects (Cont.)
• Scaling Kafka is difficult due to the way Kafka stores data within the broker as
distributed logs that stores as messaging persistence store
• Changing partition sizes to allow more storage can mess the message order by
conflicting the message indexes, but for CQRS order is most important
• Must plan and calculate number of brokers, topics, partitions and replicas in first place
to avoid scaling problems
• Cluster rebalancing can impact performance of connected producers and consumers.
Brokers must synchronize state from others that contain replicas of its partitions.
• No native authorization on events/commands and no native event analyzer tool (you
would for example need Apache Spark or Heron)
• Kafka’s geo replication mechanism is notoriously problematic, even within just two
data centers, resulting in delayed data delivery or complete loss of data

Let’s talk about our “friend”, Pulsar

What is Apache Pulsar?
Apache Pulsar is an open source pub-sub messaging platform backed by durable
storage (Apache Bookkeeper) with the following cool features:

#1 Streaming and Queing come together
• Kafka is a event streaming platform (only)
• It implements the concept of topics
• Pub/sub system, permanent storage,
processing event streams, Avro message
validation shema
• Based on a distributed commit log
• Capable of handling trillions of events a day
Kafka
• Pulsar started as message queing platform, but
can also handle high-rate
real-time (event sourcing) use case
• It implements the concept of topics
• Pub/sub, distributed + cold storage, event
processing, and multiple message validation
schemas (incl. AVRO, JSON, raw)
• And standard message queuing patterns
• Competing consumers, fail-over
subscriptions, and message fan out
• Based on distributed legder / log segments
Pulsar

#2 Logs versus distributed ledgers
Kafka logs are append-only and sequential, so data can be written to and extracted from them quickly
But simple logs can get you into trouble when they get large – storage, scale up / out, replication …

Pulsar avoids the problem of copying large
logs by breaking the log into segments. It
distributes those segments across multiple
servers while the data is being written by
using BookKeeper as its storage layer
For event sourcing and CQRS it is important to distribute events quickly to all consumers even when loads get high!

With Pulsar tiered storage you can
automatically push old messages into
practically infinite, cheap cloud storage and
retrieve them just like you do those newer,
fresh messages
#3 Tiered Storage (plus for Event Sourcing)
For event sourcing it is important to be able to replay messages that already been consumed. For example when the
application got an unexpected exception or you build a new application. So keeping then forever! Sounds great right?

#3 Tiered Storage (plus for Event Sourcing)
Plus for ES/CQRS:
New tenants (bookkeepers) can
easily sync data and consumers
can easily get all events via same
API without difference

#4 Stateful vs Stateless brokers
• Kafka uses stateful brokers
• Every broker contains the complete log for each
of its partitions.
• If a broker fails, the work can’t be taken over by
just any broker. Also when load is getting to high
• Brokers must synchronize state from other
brokers that contain replicas of its partitions.
• Syncing large logs takes time to distribute
Kafka
• Pulsar uses stateless brokers
• Pulsar holds state, but just not in the broker
• Brokering of data is separated from storing
• Brokers accept data (from producers)
• Brokers send data (to consumers)
• Data is separate stored in BookKeeper
• If load gets high can easily add a new broker
• No data to load so starts immediately
• Plus for ES/CQRS: always garantee data is
available to consumer in real-time (in high load)
Pulsar

Source: https://www.xenonstack.com/
#4 Stateful vs Stateless brokers
Plus for ES/CQRS:
Always garantee data is available
to consumer in real-time (e.g. in
high load)

In Pulsar Type safety is paramount in
communication between the producer and
the consumer. For safety in messaging,
pulsar adopted both client-side as server-side
schema/message validation.
#5 Schema registry and message validation
For CQRS it is important that clients send only (known) commands that can be validated. Kafka only support
message validation using AVRO schemas. Additionally, messages need to be encoded and decoded in code

#5 Schema registry and message validation - how it works?
• Pulsar schema is applied and enforced at the topic level. Producers and
consumers upload schemas to pulsar are asked to conform to the following:
• The Pulsar schema message consists of :
• Name: name is the topic to which the schema is applied.
• Type: the schema format
• Schema: binary representation of the schema.
• User-defined properties as a string/string map
• It supports the following schema payload formats in:
• JSON
• Protobuf
• Avro
• string (used for UTF-8-encoded lines)
• raw bytes (if no schema is defined)
{
"name": "test-string-schema",
"type": "STRING",
"schema": "",
"properties": {}
}

5 more reasons to choice Pulsar

Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robert van Mölken

More Related Content

Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robert van Mölken