1. Pulsar uses bookies to persist messages and brokers to serve clients and select bookies. ZooKeeper stores metadata.
2. When a message is produced, it is sent to a broker and written to multiple bookies. Consumers connect to brokers and receive messages from caches or by brokers reading from bookies.
3. Pulsar retains messages based on retention policies like time and size. Messages are deleted by segment once all subscriptions are caught up to avoid deleting messages still needed.
1 of 38
More Related Content
TGIPulsar - EP #006: Lifecycle of a Pulsar message
5. Brokers + Bookies
Bookie 0 Bookie 1 Bookie 2
The processes for storing
data are called bookies. They
persist data for Pulsar.
Broker 0 Broker 1 Broker 2
Brokers are “stateless”. They
serve clients for producing and
consuming events
6. ZooKeeper
Bookie 0 Bookie 1 Bookie 2
The processes for storing
data are called bookies. They
persist data for Pulsar.
Broker 0 Broker 1 Broker 2
Brokers are “stateless”. They
serve clients for producing and
consuming events
ZooKeeper
ZooKeeper
ZooKeeper
ZooKeeper is used for storing the
metadata for Pulsar and
bookkeeper as well as for
discovering brokers and bookies.
7. Pulsar Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker X Broker Y Broker Z
Subscription A
Consumer (P012)
8. Produce Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
1. A message is created and a
partition is selected
2. The message is sent to the
owner broker that serves the
selected partition
3. The message is written to N bookies in
parallel by the owner broker. The message
is written once and stored in their entirety.
4. Once the message has been
written by 2 bookies, the broker
will acknowledge the message
9. Consume
(Cached)
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
Consumer (P012)
1. The consumer subscribes to a
topic. It connects to the owner
brokers serving the partitions.
2. Broker sends messages for the
partition coming out of its
memory cache
3. Consumer acknowledges a
message after processing it.
Broker updates cursor once it
receives acknowledgment.
10. Consume
(BK)
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
Consumer (P012)
1. The consumer subscribes to a
topic. It connects to the owner
brokers serving the partitions.
2. Broker does not have the data in
the memory and will read from one
of the Bookies that have the data.
3. Consumer acknowledges a
message after processing it.
Broker updates cursor once it
receives acknowledgment.
11. Failures Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
In flights messages will be
automatically retried by
Pulsar clients
Brokers are stateless. Any
broker process that dies that
doesn’t impact data storage.
Consumer (P012)
When a bookie dies, all the data
is still accessible and will be
replicated by other replicas
34. Storage size
✓ All the storage occupied by the “undeleted”
segments, in bytes
35. Message deletion
✓ Messages are deleted segment by segment
✓ The disk space of a segment is reclaimed by a
garbage collector thread after it is deleted
✓ The garbage collector is running periodically
○ gcWaitTime