SlideShare a Scribd company logo
Open keynote_carolyn&matteo&sijie
Welcome to the first-ever Pulsar Summit,
hosted by:
&
Platinum
Gold
Community
Media
A Big Thanks to Our Sponsors
A Big Thanks to the Program Committee
Sijie Guo Matteo Merli Jia Zhai Jesse Anderson Nozomi Kurihara
Jerry Peng Ben Lorica Dave Fisher Yuvaraj Loganathan
A Big Thanks to the Speakers
Apache Pulsar in 2020
Pulsar Community & Ecosystem
● Major Product
Releases & Updates
● Monthly Webinars
● Weekly Trainings
○ TGIP Every Friday
at 1pm PT
● Case Studies, White
Papers & Use Cases
Apache Pulsar in 2020
Pulsar Community & Ecosystem
Get Involved!
#1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case studies and
more
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
Get Involved!
#1. Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case
studies and more. At the bottom of the
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
Apache Pulsar in 2020
First-Ever Apache Pulsar Summit
● 36 speakers
● 35+ sessions
● 550+ attendee sign-ups
● 300+ companies represented
Conference Overview & Logistics
In-Person Summit
San Francisco, CA
March 2020
Virtual Summit
June 2020
Conference Overview & Logistics
2 Days / 3 Tracks / 3 Zoom Links
TRACK 1 TRACK 2 TRACK 3TIME
8:30-9:10
9:20-10
10:20-10:50
11-11:40
Keynote: Adoption, Use Cases & The Future of Pulsar (TRACK 1)
Keynote: Why Splunk Chose Pulsar, by Karthik Ramasamy (TRACK 1)
Scaling Customer
Engagement...
Kafka on Pulsar...
Pulsar Storage on
Bookkeeper...
Getting Pulsar
Spinning...
Event Propagation
across…
Pulsar Functions Deep
Dive...
Pulsar Summit Track Moderators
Track 1: Carolyn King
Marketing, StreamNative
Track 2: Jun Wang
Marketing & Events
Track 3: Rosalie Bartlett
Sr Community Mgr, Verizon Media
Contact us at events@streamnative.io with any questions!
KEYNOTE SESSION:
Messaging and Event Streaming
Adoption, Use Cases and the Future of Pulsar
Matteo Merli / Splunk
Sijie Guo / StreamNative
Who are we?
- Sijie Guo (@sijieg)
- Co-Founder, StreamNative
- PMC Member of Pulsar/BookKeeper
- Ex Co-Founder, Streamlio
- Matteo Merli (@merlimat)
- Sr. Principal Engineer, Splunk
- Co-creator and PMC chair of Pulsar
- Ex Co-Founder, Streamlio
Splunk
Splunk provides operational intelligence software that monitors,
reports, and analyzes real-time machine data.
Splunk acquired Streamlio in Nov 2019 as part of an expanded
investment in the data streaming space.
Splunk is using Pulsar in multiple product lines and it’s deeply
committed in further developing Pulsar and fostering its community.
StreamNative
Founded by the developers of Apache Pulsar and Apache BookKeeper,
StreamNative enables companies to access enterprise data as real-time
event streams.
- Contributing and helping the Pulsar/BK community grow
- Helping people resolve business problems using Pulsar
- Providing managed Pulsar services and enterprise support
Agenda
- How Organizations are Using Pulsar Today / Sijie Guo
- What is Driving Pulsar Adoption / Sijie Guo
- The Future of Apache Pulsar / Matteo Merli
What is Apache Pulsar?
Pulsar is a cloud-native messaging
and event streaming platform
Pulsar’s Global Adoption
Splunk
The Data-to-Everything™
Platform
- Industry: IT
- Adoption: +6 months
- Market Cap: 29B
#1 Splunk Data Stream Processor
based on Apache Pulsar
#2 Streaming and Batch connectors
#3 Pulsar as a Service for Splunk cloud
products
Case Study Highlights
Narvar
Intelligent Customer
Experience Platform
- Industry: Retail
- Adoption: 1.5 years
- Scale: 50k txns/second
- Mission Critical Applications
#1 Real time transactional messaging
#2 Data integration with Data Lake
#3 Complex event processing
#4 Heavy Pulsar Functions user
Case Study Highlights
Instructure
Educational Technology
Company
- Industry: Education
- Adoption: 1+ years
- Scale: 8 AWS regions, 50k
msgs/sec in the busiest region
#1 Low-cost
#2 Easy to manage long term retention
#3 Unified messaging model
Case Study Highlights
Clever Cloud
PaaS Company
- Industry: Cloud Computing
- Adoption: 1+ years
- Use Cases: log ingestion pipeline
and Function as a Service
#1 Multi-tenant queue system
#2 Proxy Architecture
#3 Presto and S3 integration
Case Study Highlights
Tencent
The Wechat Company
- Industry: Internet
- Use case: Financial & Log pipeline
- Adoption: 2+ years
- Scale: 10s billions of financial txns
every day
#1 Powering Tencent Billing Platform
#2 Data transfer layer for federated
machine learning platform
#3 Replace Kafka for its logging pipeline
in Tencent Games
Case Study Highlights
Huya Live
Live Streaming Service
- Market Cap: 4B
- Use case: Log collection
- Adoption: 1+ year
- Scale: 15 millions msgs/sec
#1 Replace Kafka in its log pipeline
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
Yum China
American Fortune 500
fast-food company
- Revenue: 8B
- Use case: Notification & Order
Processing
- Adoption: 6+ months
#1 Replace RabbitMQ
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
Global Adoption
What is driving Pulsar Adoption?
-
Insights from the Pulsar User Survey 2020
What value do Pulsar bring to your organization?
#1 Increased Agility
#2 Unlocks New Use
Cases for the Business
#3 Reduced Costs
#4 Improved Customer
Experience
What are the top 3 highlights for Pulsar?
#1 Architecture Design
#2 Scalability
#3 Resiliency
Top Use-Cases
#1 Asynchronous
Applications
#2 Building Core Business
Applications
#3 ETL / Data Pipelines
Most-Used Features
#1 Pub/Sub
#2 Multi-Tenancy
#3 Functions
#4 Tiered Storage
#5 Connectors
42%
Consider Pulsar to replace two or more messaging systems,
Because Pulsar is a unified messaging and event streaming platform.
Apache Pulsar: 18-mo Growth Rate
7x
growth
Apache Pulsar today
The Apache Pulsar community has shaped the current Pulsar as -
A cloud-native messaging and event streaming platform
-
Pub/Sub
Store
Process
The future of Apache Pulsar
History
✓ 2012 — Pulsar inception at Yahoo
✓ Mandate: scalable, multi-tenant messaging service
✓ 2016 — Open-Sourced by Yahoo
✓ 2017 — Project migrated to Apache Software Foundation
✓ 2018
✓ Promoted to Top-Level Apache project
✓ Apache Pulsar 2.0 released
✓ 2019 — Adoption increase
Pulsar today
● Huge growth for the project
○ Scope and features
○ Community
● We have been able to build a lot without changing the core of the
system
● The path has been linear:
● Focus on providing the best infra for messaging and event streaming
● Build on the architectural strength of Pulsar
● Listen to the users, make their life easier
1. Pub-Sub implemented over distributed log-storage
2. Schema
3. Pulsar Functions
4. Pulsar IO
5. Pulsar SQL
6. Tiered Storage
Pulsar evolution
Messaging
-
Publish and consume events at scale from anywhere using any
protocols and languages
Protocol Handler (*oP)
- KoP: Kafka-on-Pulsar
- AoP: AMQP-on-Pulsar
- MoP: MQTT-on-Pulsar
New Features
- Transaction Support: lot of progress, coming in 2.7
- REST API to produce / consume
- Readonly brokers:
- High fanout
- Scale brokers on demand without affecting ownership
- Exclusive Producer
- Single writer to provide fencing and leader election for applications
Partitions auto-scaling
- Partitions are an artifact of scaling
- System complexity should be hidden
- Pulsar should be able to automatically manage partitions:
- Increase / Decrease based on load
- Retain ordering
- Remove duality of partitioned/non-partitioned topics
- Pulsar uses ZooKeeper as a metadata store and coordination
service
- Work in ongoing to abstract the metadata access layer
- Soon it will be possible to choose from different backend
implementations
- In future, we would provide out of the box metadata support in
Pulsar
Pluggable metadata store
Storage
-
Continue to push the boundaries for truly scalable stream storage
Performance - Operability - Cost
- A lot of behind the scenes work has been happening on performance
- Continuously ensure that Pulsar + BookKeeper are the most effective
platform to store data in every environment:
- Cloud / Multi-Cloud
- On-Prem
- Efficiently support wildly different requirements:
- Strong consistency and durability
- Low cost and huge throughput
Storage
1. Distributed log-storage
2. Schema — Structured storage
3. Tiered Storage — Infinite stream capacity
4. Topic Compaction — Table & Stream duality
5. Key-Value — Functions state access
Storage evolution
Storage - Columnar Offloader
State Store
Key-Value store used by Pulsar Functions
- Maturing Global State
- Hot-replicas for super-fast failovers
- Monitoring
- Extensive testing
- Change data capture of state updates
- Local read access to cached values
- Efficiently support read-intensive data accesses
Processing
-
Built-In processing + Integrate with existing platforms
Processing
Process event streams in real-time at scale
- Pulsar Functions ⟶ lightweight / serverless compute
- Pulsar-Flink / Pulsar-Spark ⟶ batch and stream processing
- Pulsar SQL ⟶ interactive queries with Presto
Pulsar-Flink
- FLIP-72: Introduce Pulsar Connector
- Batch reader: for batch processing
- Segment reader
- Bypassing brokers
- Read segments from Apache BookKeeper and Tiered Storage
- Sub-stream reader: for scale-out stream processing
- Key_Shared subscription & readers
- Read from brokers
- Scale the processing parallelism beyond the number of partitions
Event storage API
- Provide multiple access layers to the data
- Stream Reader: read events in a partition-based order
- Sub-stream Reader: read events in key-based order
- Segment Readers: read segments from Apache BookKeeper and Tiered Storage
- Integrations
- Pulsar-Flink
- Pulsar-Spark
- Pulsar-Presto
- Pluggable Language Runtime
- Function registry
- Share and reuse functions
- Operability
- Functions Versioning
- Upgrade / Rollback
- A/B testing
- Connectors (Batch / Stream connectors)
Pulsar Functions
Function Mesh
- PIP-66: Function Mesh
- Compose multiple
sources, sinks and
functions together
- YAML Config or DSL
Function Mesh
Managements Tools
- Pulsar Manager
- Support Schema, Functions & Connectors
- Integrate with BookKeeper Visual Manager
- PulsarCtl - Go based CLI admin tool
- Pulsar Helm Chart
- Kubernetes Operator
- Tenant & Topic level configuration policies
- Broker interceptors
- Pluggable provider for tracing and extending broker capabilities
- System Events/Topics - Change Event Streams
Open keynote_carolyn&matteo&sijie

More Related Content

Open keynote_carolyn&matteo&sijie

  • 2. Welcome to the first-ever Pulsar Summit, hosted by: &
  • 4. A Big Thanks to the Program Committee Sijie Guo Matteo Merli Jia Zhai Jesse Anderson Nozomi Kurihara Jerry Peng Ben Lorica Dave Fisher Yuvaraj Loganathan
  • 5. A Big Thanks to the Speakers
  • 7. Pulsar Community & Ecosystem ● Major Product Releases & Updates ● Monthly Webinars ● Weekly Trainings ○ TGIP Every Friday at 1pm PT ● Case Studies, White Papers & Use Cases
  • 9. Pulsar Community & Ecosystem
  • 10. Get Involved! #1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com ○ #PulsarSummit - connect with fellow attendees in real-time ○ #Job-Board - post and search Pulsar-related jobs #2. Join Pulsar Summit Newsletter List ○ Learn about upcoming webinars, product releases, case studies and more #3. Follow @apache_pulsar on Twitter #4. Take the Post-Summit Survey!
  • 11. Get Involved! #1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com ○ #PulsarSummit - connect with fellow attendees in real-time ○ #Job-Board - post and search Pulsar-related jobs #2. Join Pulsar Summit Newsletter List ○ Learn about upcoming webinars, product releases, case studies and more. At the bottom of the #3. Follow @apache_pulsar on Twitter #4. Take the Post-Summit Survey!
  • 12. Apache Pulsar in 2020 First-Ever Apache Pulsar Summit ● 36 speakers ● 35+ sessions ● 550+ attendee sign-ups ● 300+ companies represented
  • 13. Conference Overview & Logistics In-Person Summit San Francisco, CA March 2020 Virtual Summit June 2020
  • 14. Conference Overview & Logistics 2 Days / 3 Tracks / 3 Zoom Links TRACK 1 TRACK 2 TRACK 3TIME 8:30-9:10 9:20-10 10:20-10:50 11-11:40 Keynote: Adoption, Use Cases & The Future of Pulsar (TRACK 1) Keynote: Why Splunk Chose Pulsar, by Karthik Ramasamy (TRACK 1) Scaling Customer Engagement... Kafka on Pulsar... Pulsar Storage on Bookkeeper... Getting Pulsar Spinning... Event Propagation across… Pulsar Functions Deep Dive...
  • 15. Pulsar Summit Track Moderators Track 1: Carolyn King Marketing, StreamNative Track 2: Jun Wang Marketing & Events Track 3: Rosalie Bartlett Sr Community Mgr, Verizon Media Contact us at [email protected] with any questions!
  • 16. KEYNOTE SESSION: Messaging and Event Streaming Adoption, Use Cases and the Future of Pulsar Matteo Merli / Splunk Sijie Guo / StreamNative
  • 17. Who are we? - Sijie Guo (@sijieg) - Co-Founder, StreamNative - PMC Member of Pulsar/BookKeeper - Ex Co-Founder, Streamlio - Matteo Merli (@merlimat) - Sr. Principal Engineer, Splunk - Co-creator and PMC chair of Pulsar - Ex Co-Founder, Streamlio
  • 18. Splunk Splunk provides operational intelligence software that monitors, reports, and analyzes real-time machine data. Splunk acquired Streamlio in Nov 2019 as part of an expanded investment in the data streaming space. Splunk is using Pulsar in multiple product lines and it’s deeply committed in further developing Pulsar and fostering its community.
  • 19. StreamNative Founded by the developers of Apache Pulsar and Apache BookKeeper, StreamNative enables companies to access enterprise data as real-time event streams. - Contributing and helping the Pulsar/BK community grow - Helping people resolve business problems using Pulsar - Providing managed Pulsar services and enterprise support
  • 20. Agenda - How Organizations are Using Pulsar Today / Sijie Guo - What is Driving Pulsar Adoption / Sijie Guo - The Future of Apache Pulsar / Matteo Merli
  • 21. What is Apache Pulsar?
  • 22. Pulsar is a cloud-native messaging and event streaming platform
  • 24. Splunk The Data-to-Everything™ Platform - Industry: IT - Adoption: +6 months - Market Cap: 29B #1 Splunk Data Stream Processor based on Apache Pulsar #2 Streaming and Batch connectors #3 Pulsar as a Service for Splunk cloud products Case Study Highlights
  • 25. Narvar Intelligent Customer Experience Platform - Industry: Retail - Adoption: 1.5 years - Scale: 50k txns/second - Mission Critical Applications #1 Real time transactional messaging #2 Data integration with Data Lake #3 Complex event processing #4 Heavy Pulsar Functions user Case Study Highlights
  • 26. Instructure Educational Technology Company - Industry: Education - Adoption: 1+ years - Scale: 8 AWS regions, 50k msgs/sec in the busiest region #1 Low-cost #2 Easy to manage long term retention #3 Unified messaging model Case Study Highlights
  • 27. Clever Cloud PaaS Company - Industry: Cloud Computing - Adoption: 1+ years - Use Cases: log ingestion pipeline and Function as a Service #1 Multi-tenant queue system #2 Proxy Architecture #3 Presto and S3 integration Case Study Highlights
  • 28. Tencent The Wechat Company - Industry: Internet - Use case: Financial & Log pipeline - Adoption: 2+ years - Scale: 10s billions of financial txns every day #1 Powering Tencent Billing Platform #2 Data transfer layer for federated machine learning platform #3 Replace Kafka for its logging pipeline in Tencent Games Case Study Highlights
  • 29. Huya Live Live Streaming Service - Market Cap: 4B - Use case: Log collection - Adoption: 1+ year - Scale: 15 millions msgs/sec #1 Replace Kafka in its log pipeline #2 Instant scalability #3 Multi Tenancy Case Study Highlights
  • 30. Yum China American Fortune 500 fast-food company - Revenue: 8B - Use case: Notification & Order Processing - Adoption: 6+ months #1 Replace RabbitMQ #2 Instant scalability #3 Multi Tenancy Case Study Highlights
  • 32. What is driving Pulsar Adoption? - Insights from the Pulsar User Survey 2020
  • 33. What value do Pulsar bring to your organization? #1 Increased Agility #2 Unlocks New Use Cases for the Business #3 Reduced Costs #4 Improved Customer Experience
  • 34. What are the top 3 highlights for Pulsar? #1 Architecture Design #2 Scalability #3 Resiliency
  • 35. Top Use-Cases #1 Asynchronous Applications #2 Building Core Business Applications #3 ETL / Data Pipelines
  • 36. Most-Used Features #1 Pub/Sub #2 Multi-Tenancy #3 Functions #4 Tiered Storage #5 Connectors
  • 37. 42% Consider Pulsar to replace two or more messaging systems, Because Pulsar is a unified messaging and event streaming platform.
  • 38. Apache Pulsar: 18-mo Growth Rate 7x growth
  • 39. Apache Pulsar today The Apache Pulsar community has shaped the current Pulsar as - A cloud-native messaging and event streaming platform - Pub/Sub Store Process
  • 40. The future of Apache Pulsar
  • 41. History ✓ 2012 — Pulsar inception at Yahoo ✓ Mandate: scalable, multi-tenant messaging service ✓ 2016 — Open-Sourced by Yahoo ✓ 2017 — Project migrated to Apache Software Foundation ✓ 2018 ✓ Promoted to Top-Level Apache project ✓ Apache Pulsar 2.0 released ✓ 2019 — Adoption increase
  • 42. Pulsar today ● Huge growth for the project ○ Scope and features ○ Community ● We have been able to build a lot without changing the core of the system ● The path has been linear: ● Focus on providing the best infra for messaging and event streaming ● Build on the architectural strength of Pulsar ● Listen to the users, make their life easier
  • 43. 1. Pub-Sub implemented over distributed log-storage 2. Schema 3. Pulsar Functions 4. Pulsar IO 5. Pulsar SQL 6. Tiered Storage Pulsar evolution
  • 44. Messaging - Publish and consume events at scale from anywhere using any protocols and languages
  • 45. Protocol Handler (*oP) - KoP: Kafka-on-Pulsar - AoP: AMQP-on-Pulsar - MoP: MQTT-on-Pulsar
  • 46. New Features - Transaction Support: lot of progress, coming in 2.7 - REST API to produce / consume - Readonly brokers: - High fanout - Scale brokers on demand without affecting ownership - Exclusive Producer - Single writer to provide fencing and leader election for applications
  • 47. Partitions auto-scaling - Partitions are an artifact of scaling - System complexity should be hidden - Pulsar should be able to automatically manage partitions: - Increase / Decrease based on load - Retain ordering - Remove duality of partitioned/non-partitioned topics
  • 48. - Pulsar uses ZooKeeper as a metadata store and coordination service - Work in ongoing to abstract the metadata access layer - Soon it will be possible to choose from different backend implementations - In future, we would provide out of the box metadata support in Pulsar Pluggable metadata store
  • 49. Storage - Continue to push the boundaries for truly scalable stream storage Performance - Operability - Cost
  • 50. - A lot of behind the scenes work has been happening on performance - Continuously ensure that Pulsar + BookKeeper are the most effective platform to store data in every environment: - Cloud / Multi-Cloud - On-Prem - Efficiently support wildly different requirements: - Strong consistency and durability - Low cost and huge throughput Storage
  • 51. 1. Distributed log-storage 2. Schema — Structured storage 3. Tiered Storage — Infinite stream capacity 4. Topic Compaction — Table & Stream duality 5. Key-Value — Functions state access Storage evolution
  • 52. Storage - Columnar Offloader
  • 53. State Store Key-Value store used by Pulsar Functions - Maturing Global State - Hot-replicas for super-fast failovers - Monitoring - Extensive testing - Change data capture of state updates - Local read access to cached values - Efficiently support read-intensive data accesses
  • 54. Processing - Built-In processing + Integrate with existing platforms
  • 55. Processing Process event streams in real-time at scale - Pulsar Functions ⟶ lightweight / serverless compute - Pulsar-Flink / Pulsar-Spark ⟶ batch and stream processing - Pulsar SQL ⟶ interactive queries with Presto
  • 56. Pulsar-Flink - FLIP-72: Introduce Pulsar Connector - Batch reader: for batch processing - Segment reader - Bypassing brokers - Read segments from Apache BookKeeper and Tiered Storage - Sub-stream reader: for scale-out stream processing - Key_Shared subscription & readers - Read from brokers - Scale the processing parallelism beyond the number of partitions
  • 57. Event storage API - Provide multiple access layers to the data - Stream Reader: read events in a partition-based order - Sub-stream Reader: read events in key-based order - Segment Readers: read segments from Apache BookKeeper and Tiered Storage - Integrations - Pulsar-Flink - Pulsar-Spark - Pulsar-Presto
  • 58. - Pluggable Language Runtime - Function registry - Share and reuse functions - Operability - Functions Versioning - Upgrade / Rollback - A/B testing - Connectors (Batch / Stream connectors) Pulsar Functions
  • 59. Function Mesh - PIP-66: Function Mesh - Compose multiple sources, sinks and functions together - YAML Config or DSL
  • 61. Managements Tools - Pulsar Manager - Support Schema, Functions & Connectors - Integrate with BookKeeper Visual Manager - PulsarCtl - Go based CLI admin tool - Pulsar Helm Chart - Kubernetes Operator - Tenant & Topic level configuration policies - Broker interceptors - Pluggable provider for tracing and extending broker capabilities - System Events/Topics - Change Event Streams