The last few years have seen the emergence of Serverless as a paradigm for event streaming. Its very simple programming model has attracted developers in droves. At the same time, its ability to elastically scale has simplified operations significantly. Combined together with the ubiquity of their presence across all cloud providers, serverless today has become the leading choice to do event processing at scale for a lot of companies.
In this talk, Sijie Guo from StreamNative will explore how the serverless paradigm is applied to event streaming in Apache Pulsar, a next-generation event streaming system. Pulsar provides native support for serverless functions where the events are processed as soon as they arrive in a streaming manner and that provides flexible deployment options (thread, process, container). He will describe how these serverless functions make data engineering easier and share the real world usage of Pulsar Functions.
2. ● Apache Pulsar PMC Member
● Apache BookKeeper PMC Chair
● Twitter, Yahoo Alumni
● Founder of StreamNative
● Interested in technologies around Event Streaming
Who am I
3. ● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
29. Context - State
● Global Managed State *
● Mutable by functions and admin-cli
● Queryable by functions and admin-cli
● State are stored at storage layer
● State are implemented using streams + snapshot
30. Context - State API
● Key/Value State API
○ putState
○ getState
● Counter State API
○ getCounter
○ incrCounter
31. Context - Metrics
● API - recordMetric(String metricName, double value)
● Exposed in prometheus format
● Collected by prometheus
32. Flexible Runtime
● Colocate with Broker - Thread & Process
● Managed Function Workers - Thread & Process
● External Schedulers - Container
○ Kubernetes
36. Event Routing
● Events are routed to different partitions
● Leverage Pulsar’s MessageRouter
● Existing MessageRouters
○ Round-Robin
○ SinglePartition
○ Hash (Murmur32)
● Customize MessageRouter
37. Auto load balancing
● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers
● Shared Subscription
○ Load is distributed among function instances (consumers) evenly
○ More function instances provides more processing capability
● Failover Subscription
○ Load is distributed among function instances (consumers) by
partitions
○ The number of function instances is limited by the number of
partitions
40. Pulsar Functions Architecture
● Stream-First Design
● Leverage existing infrastructure, no more external dependencies
● Core components
○ Function metadata management
○ Function Worker membership and coordination
○ Function Assignment
41. Function metadata management
● Store function metadata
● Key/value store backed by a Pulsar topic
○ Function FQFN as the store key
● Use compaction to compact function metadata
store
42. Function Worker Membership
● Manage the memberships of function workers
● Every function worker subscribes to a
coordination topic
● Pulsar broker tracks the alive consumers for a
subscription
● Pulsar queries the list of function workers by
querying the alive consumers of the
subscription to coordination topic
43. Function Assignment
● Function workers elect a leader as the scheduler
manager by using “Failover” subscription
● The scheduler manager computes the assignment
using function metadata and membership
● The assignments are published to Assignment Topic
● Each function worker receives assignment by
subscribing to Assignment Topic
44. Function Runtime Manager
● Function Runtime Manager manages the running
function instances
● It receives assignment from Assignment Topic
● It compares its current running state with the
assignments and react to the assignments
○ Start function instances to invoke functions
○ Stop function instances
51. Pulsar Functions Summary
● A serverless approach to do event streaming
● Flexible, lightweight, easy to understand and use
● Event-first, Stream-first
● Stateless + Stateful (*)
● Flexible runtime and data locality
● Functions can be orchestrated to do complex processing
○ Workflow, DAG, Iterations, Graph, and ...
52. Pulsar Functions Roadmap
● More languages support
● Function Orchestration
● Managed state vs Local state
● Large state
● Transactional Processing
● RL on Functions
● ...