Sijie Guo presented an overview of new features in Apache Pulsar 2.4.0 including key-shared subscriptions for ordered consumption across partitions, delayed and scheduled message delivery, replicated subscriptions to replicate subscription state across regions, and various other improvements and integrations. Key highlights were key-shared subscriptions for ordering messages based on keys across partitions, delayed message delivery using an in-memory queue, replicated subscriptions to maintain subscription state across regions, and the Pulsar Spark SQL connector for integrating Spark SQL with Pulsar schemas.
7. Key_Shared Subscription (2)
❏ Existing Subscription Modes
❏ Streaming modes : Exclusive / Failover
❏ Pro: partition based ordered consumption
❏ Con: the consumption parallelism is
limited by the number of partitions
❏ Queuing mode: Shared
❏ Pro: scale beyond the limitation of the
number of partitions
❏ Con: unordered consumption
9. Key_Shared Subscription (4)
❏ Key based ordering
❏ Key can be message key or a separated *order* key
❏ HashRing based routing
❏ Key based batcher
❏ Policies for messages without *keys*
https://github.com/apache/pulsar/wiki/PIP-34:-Add-new-subscribe-type-Key_shared
10. Key_Shared Subscription (5)
❏ Future Tasks
❏ Sticky Key-Range Consumer
❏ Use case: EO & Auto scaling for Flink
❏ Consumer selector policy
❏ Consumer priority
❏ Auto scale up-and-down partitions (*)
https://github.com/apache/pulsar/issues/4077
12. Delayed / Scheduled Messages (2)
❏ DelayedDeliveryTracker Abstraction
❏ In-memory priority queue implementation
❏ Plan for memory and resource usage
❏ A persistent hashed-wheel implementation in 2.5.0+
https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
13. Replicated Subscription (1)
❏ Problem
❏ Data are replicated asynchronously between regions
❏ Subscriptions are local to regions
❏ Subscription state is not replicated across regions
❏ Seek back by time when moving a subscription from
one region to the other region
14. Replicated Subscription (2)
❏ Solution: replicate subscription state
❏ Establish an association between messaging ids from
different regions
❏ Distributed Snapshot
❏ Snapshots are stored as “marker” messages
❏ Marker messages are inline with other messages and
replicated across regions
17. Configurable MaxMessageSize
❏ MaxMessageSize was limited at 5 MB
❏ Introduce a new broker setting `maxMessageSize`
❏ Introduce a new field in client protocol
`max_message_size`
❏ Client discovers the max_message_size of a broker at
connecting phase and set the batch buffer accordingly
https://github.com/apache/pulsar/wiki/PIP-36%3A-Max-Message-Size
29. Pulsar-Spark (4) - Schema of Pulsar Source
https://github.com/streamnative/pulsar-spark
❏ Topics without schema or with primitive schemas
❏ `value` field for message payload
❏ Topics with struct schemas (AVRO, JSON)
❏ Field names and types are kept in the row
❏ Metadata Fields
❏ __key: Binary
❏ __topic: String
❏ __messageId: Binary
❏ __publishTime: Timestamp
❏ __eventTime: Timestamp