SlideShare a Scribd company logo
Sijie Guo (@sijieg)
What’s new in Pulsar 2.4.0
Who am I
❏ Pulsar PMC Member
❏ BookKeeper PMC Chair
❏ Ex-Twitter, Ex-Yahoo
❏ Interested in event streaming
technologies
Pulsar Releases
❏ 2.2.0 - 10/27/2018
❏ 2.2.1 - 12/14/2018
❏ 2.3.0 - 02/22/2019
❏ 2.3.1 - 04/12/2019
❏ 2.3.2 - 05/02/2019
❏ 2.4.0 - 06/29/2019
Event Streaming Platform
Highlights of Pulsar 2.4.0
❏ Key_Shared Subscription
❏ Delayed/Scheduled Messages
❏ Replicated Subscription
❏ Kerberos Authentication
❏ Configurable MaxMessageSize
❏ Go Functions
❏ Schema Enhancements
Key_Shared Subscription (1)
Key_Shared Subscription (2)
❏ Existing Subscription Modes
❏ Streaming modes : Exclusive / Failover
❏ Pro: partition based ordered consumption
❏ Con: the consumption parallelism is
limited by the number of partitions
❏ Queuing mode: Shared
❏ Pro: scale beyond the limitation of the
number of partitions
❏ Con: unordered consumption
Key_Shared Subscription (3)
Key_Shared Subscription (4)
❏ Key based ordering
❏ Key can be message key or a separated *order* key
❏ HashRing based routing
❏ Key based batcher
❏ Policies for messages without *keys*
https://github.com/apache/pulsar/wiki/PIP-34:-Add-new-subscribe-type-Key_shared
Key_Shared Subscription (5)
❏ Future Tasks
❏ Sticky Key-Range Consumer
❏ Use case: EO & Auto scaling for Flink
❏ Consumer selector policy
❏ Consumer priority
❏ Auto scale up-and-down partitions (*)
https://github.com/apache/pulsar/issues/4077
Delayed / Scheduled Messages (1)
deliverAfter
producer.newMessage()
.deliverAfter(3L, TimeUnit.Minute)
.value("Hello Pulsar after 3 minutes!")
.send();
deliverAt
producer.newMessage()
.deliverAt(new Date(...).getTime())
.value("Hello Pulsar at ...")
.send();
https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
Delayed / Scheduled Messages (2)
❏ DelayedDeliveryTracker Abstraction
❏ In-memory priority queue implementation
❏ Plan for memory and resource usage
❏ A persistent hashed-wheel implementation in 2.5.0+
https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
Replicated Subscription (1)
❏ Problem
❏ Data are replicated asynchronously between regions
❏ Subscriptions are local to regions
❏ Subscription state is not replicated across regions
❏ Seek back by time when moving a subscription from
one region to the other region
Replicated Subscription (2)
❏ Solution: replicate subscription state
❏ Establish an association between messaging ids from
different regions
❏ Distributed Snapshot
❏ Snapshots are stored as “marker” messages
❏ Marker messages are inline with other messages and
replicated across regions
Replicated Subscription (3)
❏ Enable replicated subscription
https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions
Kerberos Authentication
❏ PIP-30: Mutual Authentication
Configurable MaxMessageSize
❏ MaxMessageSize was limited at 5 MB
❏ Introduce a new broker setting `maxMessageSize`
❏ Introduce a new field in client protocol
`max_message_size`
❏ Client discovers the max_message_size of a broker at
connecting phase and set the batch buffer accordingly
https://github.com/apache/pulsar/wiki/PIP-36%3A-Max-Message-Size
Go Functions
Schema Enhancements
❏ Schema versioning support
❏ Transitive schema compatibility check strategies
❏ BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE,
FULL_TRANSITIVE
❏ SchemaBuilder, RecordBuilder
❏ AUTO_CONSUME
❏ KeyValue Schema
Pulsar Ecosystem
❏ Clients
❏ Connectors
❏ Pulsar UI & management tools
❏ Tracing integration
❏ Camel integration
❏ Big data ecosystem integration
Clients
❏ Golang: https://github.com/apache/pulsar-client-go
❏ @merlimat @wolfstudy
❏ Nodejs: https://github.com/apache/pulsar-client-node
❏ Yahoo Japan
❏ Ruby: https://github.com/apache/pulsar-client-ruby
❏ Rust: https://github.com/wyyerd/pulsar-rs
Connectors
❏ Flume Source and Sink #3597
❏ Redis Sink #3700
❏ Solr Sink #3885
❏ RabbitMQ Sink #3967
❏ InfluxDB Sink #4017
❏ Flume Ng Pulsar Sink
❏ Elastic beat output to Pulsar
Pulsar UI & management tools
❏ Pulsar Express
❏ Pulsar Web UI from Yahoo Japan
❏ Pulsar Manager
Tracing Integration
❏ Zipkin Pulsar Transport:
https://github.com/openzipkin/zipkin/issues/2297
❏ Skywalking Pulsar Integration (by Zhaopin.com)
Camel Integration
❏ Contributed by The Hut Group
❏ Available in Camel 3 and Camel 2.4
❏ A feather in their caps
Pulsar-Spark (1)
❏ Spark Structured Streaming Connector
❏ Spark SQL Support
❏ Integrating with Pulsar schema
https://github.com/streamnative/pulsar-spark
Pulsar-Spark (2) - Streaming Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark (3) - Batch Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark (4) - Schema of Pulsar Source
https://github.com/streamnative/pulsar-spark
❏ Topics without schema or with primitive schemas
❏ `value` field for message payload
❏ Topics with struct schemas (AVRO, JSON)
❏ Field names and types are kept in the row
❏ Metadata Fields
❏ __key: Binary
❏ __topic: String
❏ __messageId: Binary
❏ __publishTime: Timestamp
❏ __eventTime: Timestamp
Pulsar-Spark (5) - Schema Examples
Primitive Schema Avro Schema
https://github.com/streamnative/pulsar-spark
Pulsar-Spark (6) - Write query results to Pulsar
https://github.com/streamnative/pulsar-spark
BigData Ecosystem Integration
❏ pulsar-flink
❏ Schema integration, State integration
❏ pulsar-hive
❏ External tables, managed tables
❏ Partitioned tables
Community
❏ Recurring Pulsar Meetups
❏ Conferences
❏ KubeCon Pulsar Booth from Yahoo Japan
❏ ApacheCon 2019 : 5 Pulsar talks accepted
❏ Flink Forward SF & Berlin 2019
❏ QCon, ArchSummit, …
2.5.0, Ecosystem & More
❏ Transaction Support
❏ Batch receive interface
❏ HDFS Offloader
❏ Columnar Offloader
❏ Auto partition scaling up-and-down
❏ Schema & EO support in pulsar-flink integration
❏ Go Functions (metrics, secrets, …)
❏ Kafka Gateway
Resources
❏ Twitter: @apache_pulsar / @sijieg / @streamnativeio
❏ Slack: https://apache-pulsar.herokuapp.com
❏ 2.4.0 Release Notes
❏ What’s New in Apache Pulsar 2.4.0
❏ Pulsar Spark SQL & Structured Streaming Connector
❏ Curated list of Pulsar resources
Thanks!

More Related Content

What's new in apache pulsar 2.4.0

  • 1. Sijie Guo (@sijieg) What’s new in Pulsar 2.4.0
  • 2. Who am I ❏ Pulsar PMC Member ❏ BookKeeper PMC Chair ❏ Ex-Twitter, Ex-Yahoo ❏ Interested in event streaming technologies
  • 3. Pulsar Releases ❏ 2.2.0 - 10/27/2018 ❏ 2.2.1 - 12/14/2018 ❏ 2.3.0 - 02/22/2019 ❏ 2.3.1 - 04/12/2019 ❏ 2.3.2 - 05/02/2019 ❏ 2.4.0 - 06/29/2019
  • 5. Highlights of Pulsar 2.4.0 ❏ Key_Shared Subscription ❏ Delayed/Scheduled Messages ❏ Replicated Subscription ❏ Kerberos Authentication ❏ Configurable MaxMessageSize ❏ Go Functions ❏ Schema Enhancements
  • 7. Key_Shared Subscription (2) ❏ Existing Subscription Modes ❏ Streaming modes : Exclusive / Failover ❏ Pro: partition based ordered consumption ❏ Con: the consumption parallelism is limited by the number of partitions ❏ Queuing mode: Shared ❏ Pro: scale beyond the limitation of the number of partitions ❏ Con: unordered consumption
  • 9. Key_Shared Subscription (4) ❏ Key based ordering ❏ Key can be message key or a separated *order* key ❏ HashRing based routing ❏ Key based batcher ❏ Policies for messages without *keys* https://github.com/apache/pulsar/wiki/PIP-34:-Add-new-subscribe-type-Key_shared
  • 10. Key_Shared Subscription (5) ❏ Future Tasks ❏ Sticky Key-Range Consumer ❏ Use case: EO & Auto scaling for Flink ❏ Consumer selector policy ❏ Consumer priority ❏ Auto scale up-and-down partitions (*) https://github.com/apache/pulsar/issues/4077
  • 11. Delayed / Scheduled Messages (1) deliverAfter producer.newMessage() .deliverAfter(3L, TimeUnit.Minute) .value("Hello Pulsar after 3 minutes!") .send(); deliverAt producer.newMessage() .deliverAt(new Date(...).getTime()) .value("Hello Pulsar at ...") .send(); https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
  • 12. Delayed / Scheduled Messages (2) ❏ DelayedDeliveryTracker Abstraction ❏ In-memory priority queue implementation ❏ Plan for memory and resource usage ❏ A persistent hashed-wheel implementation in 2.5.0+ https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
  • 13. Replicated Subscription (1) ❏ Problem ❏ Data are replicated asynchronously between regions ❏ Subscriptions are local to regions ❏ Subscription state is not replicated across regions ❏ Seek back by time when moving a subscription from one region to the other region
  • 14. Replicated Subscription (2) ❏ Solution: replicate subscription state ❏ Establish an association between messaging ids from different regions ❏ Distributed Snapshot ❏ Snapshots are stored as “marker” messages ❏ Marker messages are inline with other messages and replicated across regions
  • 15. Replicated Subscription (3) ❏ Enable replicated subscription https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions
  • 16. Kerberos Authentication ❏ PIP-30: Mutual Authentication
  • 17. Configurable MaxMessageSize ❏ MaxMessageSize was limited at 5 MB ❏ Introduce a new broker setting `maxMessageSize` ❏ Introduce a new field in client protocol `max_message_size` ❏ Client discovers the max_message_size of a broker at connecting phase and set the batch buffer accordingly https://github.com/apache/pulsar/wiki/PIP-36%3A-Max-Message-Size
  • 19. Schema Enhancements ❏ Schema versioning support ❏ Transitive schema compatibility check strategies ❏ BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, FULL_TRANSITIVE ❏ SchemaBuilder, RecordBuilder ❏ AUTO_CONSUME ❏ KeyValue Schema
  • 20. Pulsar Ecosystem ❏ Clients ❏ Connectors ❏ Pulsar UI & management tools ❏ Tracing integration ❏ Camel integration ❏ Big data ecosystem integration
  • 21. Clients ❏ Golang: https://github.com/apache/pulsar-client-go ❏ @merlimat @wolfstudy ❏ Nodejs: https://github.com/apache/pulsar-client-node ❏ Yahoo Japan ❏ Ruby: https://github.com/apache/pulsar-client-ruby ❏ Rust: https://github.com/wyyerd/pulsar-rs
  • 22. Connectors ❏ Flume Source and Sink #3597 ❏ Redis Sink #3700 ❏ Solr Sink #3885 ❏ RabbitMQ Sink #3967 ❏ InfluxDB Sink #4017 ❏ Flume Ng Pulsar Sink ❏ Elastic beat output to Pulsar
  • 23. Pulsar UI & management tools ❏ Pulsar Express ❏ Pulsar Web UI from Yahoo Japan ❏ Pulsar Manager
  • 24. Tracing Integration ❏ Zipkin Pulsar Transport: https://github.com/openzipkin/zipkin/issues/2297 ❏ Skywalking Pulsar Integration (by Zhaopin.com)
  • 25. Camel Integration ❏ Contributed by The Hut Group ❏ Available in Camel 3 and Camel 2.4 ❏ A feather in their caps
  • 26. Pulsar-Spark (1) ❏ Spark Structured Streaming Connector ❏ Spark SQL Support ❏ Integrating with Pulsar schema https://github.com/streamnative/pulsar-spark
  • 27. Pulsar-Spark (2) - Streaming Queries https://github.com/streamnative/pulsar-spark
  • 28. Pulsar-Spark (3) - Batch Queries https://github.com/streamnative/pulsar-spark
  • 29. Pulsar-Spark (4) - Schema of Pulsar Source https://github.com/streamnative/pulsar-spark ❏ Topics without schema or with primitive schemas ❏ `value` field for message payload ❏ Topics with struct schemas (AVRO, JSON) ❏ Field names and types are kept in the row ❏ Metadata Fields ❏ __key: Binary ❏ __topic: String ❏ __messageId: Binary ❏ __publishTime: Timestamp ❏ __eventTime: Timestamp
  • 30. Pulsar-Spark (5) - Schema Examples Primitive Schema Avro Schema https://github.com/streamnative/pulsar-spark
  • 31. Pulsar-Spark (6) - Write query results to Pulsar https://github.com/streamnative/pulsar-spark
  • 32. BigData Ecosystem Integration ❏ pulsar-flink ❏ Schema integration, State integration ❏ pulsar-hive ❏ External tables, managed tables ❏ Partitioned tables
  • 33. Community ❏ Recurring Pulsar Meetups ❏ Conferences ❏ KubeCon Pulsar Booth from Yahoo Japan ❏ ApacheCon 2019 : 5 Pulsar talks accepted ❏ Flink Forward SF & Berlin 2019 ❏ QCon, ArchSummit, …
  • 34. 2.5.0, Ecosystem & More ❏ Transaction Support ❏ Batch receive interface ❏ HDFS Offloader ❏ Columnar Offloader ❏ Auto partition scaling up-and-down ❏ Schema & EO support in pulsar-flink integration ❏ Go Functions (metrics, secrets, …) ❏ Kafka Gateway
  • 35. Resources ❏ Twitter: @apache_pulsar / @sijieg / @streamnativeio ❏ Slack: https://apache-pulsar.herokuapp.com ❏ 2.4.0 Release Notes ❏ What’s New in Apache Pulsar 2.4.0 ❏ Pulsar Spark SQL & Structured Streaming Connector ❏ Curated list of Pulsar resources