This document summarizes the first ever Pulsar Summit hosted by StreamNative. It thanks sponsors, speakers, and the program committee. It provides an overview of the Apache Pulsar project in 2020 including major releases, the growing community and ecosystem. Details are given about the virtual Pulsar Summit in June 2020 including the conference format and logistics. The keynote discusses adoption of Pulsar by companies, what is driving its use, and the future of the Apache Pulsar project.
4. A Big Thanks to the Program Committee
Sijie Guo Matteo Merli Jia Zhai Jesse Anderson Nozomi Kurihara
Jerry Peng Ben Lorica Dave Fisher Yuvaraj Loganathan
7. Pulsar Community & Ecosystem
● Major Product
Releases & Updates
● Monthly Webinars
● Weekly Trainings
○ TGIP Every Friday
at 1pm PT
● Case Studies, White
Papers & Use Cases
10. Get Involved!
#1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case studies and
more
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
11. Get Involved!
#1. Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case
studies and more. At the bottom of the
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
13. Conference Overview & Logistics
In-Person Summit
San Francisco, CA
March 2020
Virtual Summit
June 2020
14. Conference Overview & Logistics
2 Days / 3 Tracks / 3 Zoom Links
TRACK 1 TRACK 2 TRACK 3TIME
8:30-9:10
9:20-10
10:20-10:50
11-11:40
Keynote: Adoption, Use Cases & The Future of Pulsar (TRACK 1)
Keynote: Why Splunk Chose Pulsar, by Karthik Ramasamy (TRACK 1)
Scaling Customer
Engagement...
Kafka on Pulsar...
Pulsar Storage on
Bookkeeper...
Getting Pulsar
Spinning...
Event Propagation
across…
Pulsar Functions Deep
Dive...
15. Pulsar Summit Track Moderators
Track 1: Carolyn King
Marketing, StreamNative
Track 2: Jun Wang
Marketing & Events
Track 3: Rosalie Bartlett
Sr Community Mgr, Verizon Media
Contact us at [email protected] with any questions!
16. KEYNOTE SESSION:
Messaging and Event Streaming
Adoption, Use Cases and the Future of Pulsar
Matteo Merli / Splunk
Sijie Guo / StreamNative
17. Who are we?
- Sijie Guo (@sijieg)
- Co-Founder, StreamNative
- PMC Member of Pulsar/BookKeeper
- Ex Co-Founder, Streamlio
- Matteo Merli (@merlimat)
- Sr. Principal Engineer, Splunk
- Co-creator and PMC chair of Pulsar
- Ex Co-Founder, Streamlio
18. Splunk
Splunk provides operational intelligence software that monitors,
reports, and analyzes real-time machine data.
Splunk acquired Streamlio in Nov 2019 as part of an expanded
investment in the data streaming space.
Splunk is using Pulsar in multiple product lines and it’s deeply
committed in further developing Pulsar and fostering its community.
19. StreamNative
Founded by the developers of Apache Pulsar and Apache BookKeeper,
StreamNative enables companies to access enterprise data as real-time
event streams.
- Contributing and helping the Pulsar/BK community grow
- Helping people resolve business problems using Pulsar
- Providing managed Pulsar services and enterprise support
20. Agenda
- How Organizations are Using Pulsar Today / Sijie Guo
- What is Driving Pulsar Adoption / Sijie Guo
- The Future of Apache Pulsar / Matteo Merli
24. Splunk
The Data-to-Everything™
Platform
- Industry: IT
- Adoption: +6 months
- Market Cap: 29B
#1 Splunk Data Stream Processor
based on Apache Pulsar
#2 Streaming and Batch connectors
#3 Pulsar as a Service for Splunk cloud
products
Case Study Highlights
25. Narvar
Intelligent Customer
Experience Platform
- Industry: Retail
- Adoption: 1.5 years
- Scale: 50k txns/second
- Mission Critical Applications
#1 Real time transactional messaging
#2 Data integration with Data Lake
#3 Complex event processing
#4 Heavy Pulsar Functions user
Case Study Highlights
26. Instructure
Educational Technology
Company
- Industry: Education
- Adoption: 1+ years
- Scale: 8 AWS regions, 50k
msgs/sec in the busiest region
#1 Low-cost
#2 Easy to manage long term retention
#3 Unified messaging model
Case Study Highlights
27. Clever Cloud
PaaS Company
- Industry: Cloud Computing
- Adoption: 1+ years
- Use Cases: log ingestion pipeline
and Function as a Service
#1 Multi-tenant queue system
#2 Proxy Architecture
#3 Presto and S3 integration
Case Study Highlights
28. Tencent
The Wechat Company
- Industry: Internet
- Use case: Financial & Log pipeline
- Adoption: 2+ years
- Scale: 10s billions of financial txns
every day
#1 Powering Tencent Billing Platform
#2 Data transfer layer for federated
machine learning platform
#3 Replace Kafka for its logging pipeline
in Tencent Games
Case Study Highlights
29. Huya Live
Live Streaming Service
- Market Cap: 4B
- Use case: Log collection
- Adoption: 1+ year
- Scale: 15 millions msgs/sec
#1 Replace Kafka in its log pipeline
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
30. Yum China
American Fortune 500
fast-food company
- Revenue: 8B
- Use case: Notification & Order
Processing
- Adoption: 6+ months
#1 Replace RabbitMQ
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
32. What is driving Pulsar Adoption?
-
Insights from the Pulsar User Survey 2020
33. What value do Pulsar bring to your organization?
#1 Increased Agility
#2 Unlocks New Use
Cases for the Business
#3 Reduced Costs
#4 Improved Customer
Experience
34. What are the top 3 highlights for Pulsar?
#1 Architecture Design
#2 Scalability
#3 Resiliency
39. Apache Pulsar today
The Apache Pulsar community has shaped the current Pulsar as -
A cloud-native messaging and event streaming platform
-
Pub/Sub
Store
Process
41. History
✓ 2012 — Pulsar inception at Yahoo
✓ Mandate: scalable, multi-tenant messaging service
✓ 2016 — Open-Sourced by Yahoo
✓ 2017 — Project migrated to Apache Software Foundation
✓ 2018
✓ Promoted to Top-Level Apache project
✓ Apache Pulsar 2.0 released
✓ 2019 — Adoption increase
42. Pulsar today
● Huge growth for the project
○ Scope and features
○ Community
● We have been able to build a lot without changing the core of the
system
● The path has been linear:
● Focus on providing the best infra for messaging and event streaming
● Build on the architectural strength of Pulsar
● Listen to the users, make their life easier
46. New Features
- Transaction Support: lot of progress, coming in 2.7
- REST API to produce / consume
- Readonly brokers:
- High fanout
- Scale brokers on demand without affecting ownership
- Exclusive Producer
- Single writer to provide fencing and leader election for applications
47. Partitions auto-scaling
- Partitions are an artifact of scaling
- System complexity should be hidden
- Pulsar should be able to automatically manage partitions:
- Increase / Decrease based on load
- Retain ordering
- Remove duality of partitioned/non-partitioned topics
48. - Pulsar uses ZooKeeper as a metadata store and coordination
service
- Work in ongoing to abstract the metadata access layer
- Soon it will be possible to choose from different backend
implementations
- In future, we would provide out of the box metadata support in
Pulsar
Pluggable metadata store
49. Storage
-
Continue to push the boundaries for truly scalable stream storage
Performance - Operability - Cost
50. - A lot of behind the scenes work has been happening on performance
- Continuously ensure that Pulsar + BookKeeper are the most effective
platform to store data in every environment:
- Cloud / Multi-Cloud
- On-Prem
- Efficiently support wildly different requirements:
- Strong consistency and durability
- Low cost and huge throughput
Storage
53. State Store
Key-Value store used by Pulsar Functions
- Maturing Global State
- Hot-replicas for super-fast failovers
- Monitoring
- Extensive testing
- Change data capture of state updates
- Local read access to cached values
- Efficiently support read-intensive data accesses
55. Processing
Process event streams in real-time at scale
- Pulsar Functions ⟶ lightweight / serverless compute
- Pulsar-Flink / Pulsar-Spark ⟶ batch and stream processing
- Pulsar SQL ⟶ interactive queries with Presto
56. Pulsar-Flink
- FLIP-72: Introduce Pulsar Connector
- Batch reader: for batch processing
- Segment reader
- Bypassing brokers
- Read segments from Apache BookKeeper and Tiered Storage
- Sub-stream reader: for scale-out stream processing
- Key_Shared subscription & readers
- Read from brokers
- Scale the processing parallelism beyond the number of partitions
57. Event storage API
- Provide multiple access layers to the data
- Stream Reader: read events in a partition-based order
- Sub-stream Reader: read events in key-based order
- Segment Readers: read segments from Apache BookKeeper and Tiered Storage
- Integrations
- Pulsar-Flink
- Pulsar-Spark
- Pulsar-Presto