SlideShare a Scribd company logo
Serverless Event Streaming
with Pulsar Functions
Sijie Guo (@sijieg)
2019.06.22
● Apache Pulsar PMC Member
● Apache BookKeeper PMC Chair
● Twitter, Yahoo Alumni
● Founder of StreamNative
● Interested in technologies around Event Streaming
Who am I
● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
What is Apache Pulsar?
“Flexible pub/sub messaging
backed by a durable stream storage”
What is Apache Pulsar?
Pulsar - Pub/Sub
Pulsar - Multi Tenancy
Pulsar - Flexible Messaging
● One data, different ways to consume
● Queuing (aka stateless messaging)
○ Shared (* RabbitMQ)
● Streaming (aka stateful messaging)
○ Exclusive
○ Failover (* Kafka)
○ Key_Shared
Pulsar - Flexible Messaging
Pulsar - Cloud Native Architecture
Layered Architecture
❏ Independent scalability
❏ Instance failure recovery
❏ Balance-free on cluster
expansions
A Pulsar view on Data
Pulsar View - Topic
Pulsar View - Partition
Pulsar View - Segment
Pulsar View - Event Stream
Event Stream is the right foundation for your data
M(essaging), S(torage), P(rocessing)
MSP - Interactive Queries
MSP - Stream & Batch Processing
MSP - What is next?
What is next?
When Event Streaming
meets Serverless
Introduce Pulsar Functions
Pulsar Functions
● A serverless event streaming framework
● Lightweight computation
● Event-first, Stream-first
● Multi languages
● Multi runtimes
● SDK-less & SDK
Function Elements
● Input Topics
● Output Topics
● Function
● State
● Log Topics
API - Native Java / Python / Go Function
Golang
Python
Java
API - Function Context
● Logger
● State
● Metrics
● Security / Secrets
● ...
Context - Logger
Context - State
Context - State
● Global Managed State *
● Mutable by functions and admin-cli
● Queryable by functions and admin-cli
● State are stored at storage layer
● State are implemented using streams + snapshot
Context - State API
● Key/Value State API
○ putState
○ getState
● Counter State API
○ getCounter
○ incrCounter
Context - Metrics
● API - recordMetric(String metricName, double value)
● Exposed in prometheus format
● Collected by prometheus
Flexible Runtime
● Colocate with Broker - Thread & Process
● Managed Function Workers - Thread & Process
● External Schedulers - Container
○ Kubernetes
Colocate with Brokers
Managed Function Workers
External Schedulers - Kubernetes
Event Routing
● Events are routed to different partitions
● Leverage Pulsar’s MessageRouter
● Existing MessageRouters
○ Round-Robin
○ SinglePartition
○ Hash (Murmur32)
● Customize MessageRouter
Auto load balancing
● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers
● Shared Subscription
○ Load is distributed among function instances (consumers) evenly
○ More function instances provides more processing capability
● Failover Subscription
○ Load is distributed among function instances (consumers) by
partitions
○ The number of function instances is limited by the number of
partitions
Integrated with pulsar-admin CLI
Pulsar Functions Architecture
Pulsar Functions Architecture
● Stream-First Design
● Leverage existing infrastructure, no more external dependencies
● Core components
○ Function metadata management
○ Function Worker membership and coordination
○ Function Assignment
Function metadata management
● Store function metadata
● Key/value store backed by a Pulsar topic
○ Function FQFN as the store key
● Use compaction to compact function metadata
store
Function Worker Membership
● Manage the memberships of function workers
● Every function worker subscribes to a
coordination topic
● Pulsar broker tracks the alive consumers for a
subscription
● Pulsar queries the list of function workers by
querying the alive consumers of the
subscription to coordination topic
Function Assignment
● Function workers elect a leader as the scheduler
manager by using “Failover” subscription
● The scheduler manager computes the assignment
using function metadata and membership
● The assignments are published to Assignment Topic
● Each function worker receives assignment by
subscribing to Assignment Topic
Function Runtime Manager
● Function Runtime Manager manages the running
function instances
● It receives assignment from Assignment Topic
● It compares its current running state with the
assignments and react to the assignments
○ Start function instances to invoke functions
○ Stop function instances
Use Cases
Content Routing
Message Filtering
Transformation
Alert and thresholds
Complex Event Processing Pipelines
Pulsar Functions Summary
● A serverless approach to do event streaming
● Flexible, lightweight, easy to understand and use
● Event-first, Stream-first
● Stateless + Stateful (*)
● Flexible runtime and data locality
● Functions can be orchestrated to do complex processing
○ Workflow, DAG, Iterations, Graph, and ...
Pulsar Functions Roadmap
● More languages support
● Function Orchestration
● Managed state vs Local state
● Large state
● Transactional Processing
● RL on Functions
● ...
Community
● Twitter: @apache_pulsar , @streamnativeio
● Wechat: ApachePulsar, StreamNative
● Mailing Lists: dev@pulsar.apache.org users@pulsar.apache.org
● Slack: https://apache-pulsar.slack.com/
● Github:
○ https://github.com/apache/pulsar
○ https://github.com/apache/bookkeeper
● Documentation: https://pulsar.apache.org
Thanks!

More Related Content

Serverless Event Streaming with Pulsar Functions