As data continues to evolve, there are more and more requirements for calculating data. From servers to VMS to containers to serverless, the computing framework is constantly evolving with the needs of users. How to provide users with a fast, easy-to-deploy computing framework has become a question for everyone to think about. In this talk, we will introduce how Pulsar provides a powerful computing power based on the serverless computing architecture. Apache Pulsar is a new generation of cloud messaging system and real-time processing platform. The message system is closely related to the real-time computing platform, but it is often loosely deployed and managed separately. Pulsar Functions, as the computing component of Pulsar, is a fusion and innovation of the message and computing platform in the serverless direction. Pulsar Functions provides support for multiple languages such as Go, Python, and Java; and runtimes such as Thread, Process, and Kubernetes. This provides great flexibility for user-written, run, and deploy features. Let users only use logic that cares about real calculations, without complicated configuration or management; it is easier to build a message-triggered flow platform.
2. ● Go programmer
● PingCAP > Bitmain > StreamNative
● Pulsar committer
○ pulsar-client-go
○ Go Functions
○ …
● https://github.com/wolfstudy
Who am I
3. ● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
4. ● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
18. Pulsar Functions
● A serverless framework for data processing
● Lightweight computation
● Event-first, Stream-first
● Support both stateless and stateful computation
● Multi languages
● Multi runtimes
● SDK-less & SDK
24. Context - State
● Global Managed State *
● Mutable by functions and admin-cli
● Queryable by functions and admin-cli
● State are stored at storage layer
● State are implemented using streams + snapshot
25. Context - State API
● Key/Value State API
○ putState
○ getState
● Counter State API
○ getCounter
○ incrCounter
26. Context - Metrics
● API - recordMetric(String metricName, double value)
● Exposed in prometheus format
● Collected by prometheus
27. Flexible Runtime
● Colocate with Broker - Thread & Process
● Managed Function Workers - Thread & Process
● External Schedulers - Container
○ Kubernetes
31. Event Routing
● Events are routed to different partitions
● Leverage Pulsar’s MessageRouter
● Existing MessageRouters
○ Round-Robin
○ SinglePartition
○ Hash (Murmur32)
● Customize MessageRouter
32. Auto load balancing
● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers
● Shared Subscription
○ Load is distributed among function instances (consumers) evenly
○ More function instances provides more processing capability
● Failover Subscription
○ Load is distributed among function instances (consumers) by
partitions
○ The number of function instances is limited by the number of
partitions
47. Pulsar Functions Summary
● A serverless approach to do event streaming
● Flexible, lightweight, easy to understand and use
● Event-first, Stream-first
● Stateless + Stateful (*)
● Flexible runtime and data locality
● Functions can be orchestrated to do complex processing
○ Workflow, DAG, Iterations, Graph, and ...
48. Pulsar Functions Roadmap
● More languages support
● Function Orchestration
● Managed state vs Local state
● Large state
● Transactional Processing
● ...