For organizations with boundless data sources, it is important to analyze, learn, predict and even respond in real time – directly from streaming data. This is important when:
•Data volumes are large, or moving raw data is expensive,
•Data is generated by widely distributed assets (eg: mobile devices),
•Data is of ephemeral value and analysis can’t wait, or
•It is critical to always have the latest insight and extrapolation won’t do.
Use cases include prediction of failures on assembly lines, prediction of traffic flows in cities, predicting demand placed in power grids, detection of hackers, and understanding connection quality in mobile networks. They are characterized by a need to know – now – and require real-time processing of streaming data. Our goal is to enable real-time stream processing for Apache Pulsar in which analysis, learning and prediction are done on-the-fly, with continuous insights streamed back to the broker.
1 of 30
More Related Content
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
2. SwimOS is an Apache 2.0 licensed platform that makes
it easy to build applications that deliver continuous
intelligence from streaming data, at scale
swimos.org
4. • Apache Pulsar
• Apache Kafka
• Apache Beam
• CNCF NATS
• Amazon Kinesis
• Google pub/sub
• Azure Enterprise Data Bus
• Salesforce Kafka
• Confluent Cloud
• …
Streaming Platforms
Ø SwimOS is a stream processor that delivers continuous intelligence
from streaming data
• Support pub/sub at scale
• Buffer data between pubs & subs
• Event-time ordered delivery
• Events stored in arrival order
• Don’t run applications
5. • Stream processors subscribe to a broker to analyze
streaming event data
• Their insights can be asynchronously consumed by
publishing back to the broker
• The broker offers a low-latency API that gives the stream
processor events in real-time
• Pulsar does not control execution of the stream processor
Stream Processors
6. SwimOS is a Stateful, Real-time Stream Processor
• Builds and auto-scales apps from real-world event data, creating a
stateful graph that continuously computes – driven by data
• Automates infrastructure operation
• Load balances, secures, persists and auto-scales the application
• Apps are easy to develop
• Delivers unimaginable performance
Application: Distributed, stateful, concurrent graph of
Web Agents & real-time UIs
Infra: Distributed, p2p mesh of instances on k8s using
WebSockets
7. 66
Major Mobile Provider
• > 150M devices
• > 10Gb/s of streaming data from Pulsar
• Continuous analysis, aggregation & reduction
• Millisecond latency
• Pervasively real-time UI
• Distributed across AZ
8. Pulsar’s Many Pros
• Event Processing
– Filtering
– Transformation
– Counts / Windows
– Alerts
• Serverless is a great abstraction
• SQL-style API
• Storage tiering
• Delivery guarantees
• Multi-tenancy
• Replication
• Scaling
Database
llll
9. • How many topics do you need?
Challenges…
!
l
l
l
l
üüüü
15. Users Want Stateful, Continuous, Contextual Analysis
Streams are a sequence of state changes
They never stop… (so “store-then-analyze” is silly)
“Meaning” depends on granular contextual
relationships
Applications always have to have an answer
λ λ
xn-1
16. Introducing Swim Web Agents
• SwimOS subscribes to event streams from real-world sources
• It creates a stateful, concurrent web agent for each data source
• Each web agent cleans, labels, analyzes data from its real-world twin
• Agents dynamically link to related agents, creating a stateful in-memory graph
• Containment, proximity… logical relationships eg: pod/cluster …
• Computed relationships: correlated…
• Linked web agents share their states in real-time
• Web Agents are vertices in the graph
• Each continually computes on its own state & state of its links, as data flows
over the graph – and streams its results in real-time over its links
• This is data-driven, stateful, continuous computation
17. Web Agents Continuously Compute - Driven by Data
MapReduce
Graph
Analytics
Learning & Prediction
Analyze data to determine state
Relational
Relational Analysis
Real-world Stateful Web Agent
20. • Noisy / redundant updates are discarded
I’m still red
21. I’m still red
I’m green
No push
No push
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
… …
…
…
…
Streaming data auto-scales the application –
composed of concurrent web agents - at low cost, in
real time, as data arrives
24. Web agents continuously compute on their own
state and the state of linked web agents
enabling granular contextual analysis on-the-fly
① SwimOS creates a web agent
for each source in streaming data
② Agents interlink to reflect
real-world relationships
③ Powerful operators for analysis, learning &
prediction continuously compute on state
& stream results
Web Agents Link to Form a Computational Graph
26. • A scaled application is a graph
dynamically built from data
• Objects are stateful and
concurrent
27. SwimOS Eliminates “the Stack”
=
They continuously stream real-
time insights to UIs & applications
Web agents collaborate to
analyze, learn, predict and
respond on the fly
Swim builds a stateful, distributed,
graph of concurrent web agents
that statefully represent real-world
sources, from streaming data
*
Developer defines entities & their
relationships – as Java objects
28. Pulsar and Swim: Better Together
• Builds and auto-scales apps from real-world event data, creating a
stateful graph that continuously computes – driven by data
• Automates infrastructure operation
• Load balances, secures, persists and auto-scales the application
• Apps are easy to develop
• Delivers unimaginable performance
Application: Distributed, stateful, concurrent graph of
Web Agents & real-time UIs
Infra: Distributed, p2p mesh of instances on k8s using
WebSockets
30. Fabric
Pulsar Broker
EventsInsights
Mesh of SwimOS
Instances
Distributed graph
of web agents
Compute continuously
as data flows over the
graph
Web agent
address space
Clustered Stream Processor Operation