We examine real-world architectural patterns involving Apache Pulsar to automate the creation of function and pub/sub flows for improved operational scalability and ease of management. We’ll cover CI/CD automation patterns and reveal our innovative approach of leveraging streaming data to create a self-service platform that automates the provisioning of new users. We will also demonstrate the innovative approach of creating function flows through patterns and configuration, enabling non-developer users to create entire function flows simply by changing configurations. These patterns enable us to drive the automation of managing Pulsar to a whole new level. We also cover CI/CD for on-prem, GCP, and AWS users.
This is Part 2 of this presentation: https://www.youtube.com/watch?v=pmaCG...
In summary, we will cover:
CI/CD for on-prem, GCP, and AWS users
Automated creation of function flows by configuration
Automated provisioning of pub/sub users and topics
Architectural patterns and best practices that enable automation
Overstock has leveraged Pulsar as the backbone of a self-service data fabric, a unified data platform to enable users to publish and consume data across the company and integrate with other services. We utilized Pulsar to solve a data governance problem, and Pulsar has performed marvelously. To support our real-world production use cases, we have developed message flows, integrations, and architectural patterns to solve common use cases, maximize value, simplify ease-of-use, automate management, and unify company data and services around this new platform.
1 of 63
More Related Content
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
1. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: [email protected]
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures
2. Legend & Referenced Technologies
Pulsar Beam
Pulsar Topic
AWS CodePipeline
Pulsar Brokers
Kubernetes
Golang
Amazon S3
CouchDB
ReactJS
Docker
AWS IAM
GCP Cloud Build
GCP IAM
GCP Cloud Storage
Google Cloud Functions
Pulsar Function
Flink Job
Sonotype Nexus
24. Might need to manually satisfy contract at firstUntil you can get to where the data is originated
25. Build tool Artifact Storage
Build data
Build tool Artifact Storage Storage data
(1)
(2)
Filter to
artifact data
Store
Filter to
artifact data
Store
Push to gate
keeping system
Push to gate
keeping system
Push to deployment
pipeline for desired
environment
Push to deployment
pipeline for desired
environment
33. Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
34. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": "environment",
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
Creates /ops/deployment-automation/[environment]
35. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": "environment",
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
Creates /ops/deployment-automation/[environment]
36. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
Creates /ops/deployment-automation/[environment]
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": “generator-type”,
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
37. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"environment": "test",
"configs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"tenant": "exampleTenant",
"namespace": "exampleNamespace",
"name": "exampleIgniteFunction",
"className": "com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
"logTopic": "persistent://public/default/function-log-topic"
}]
}
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there
39. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"environment": "test",
"configs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"tenant": "exampleTenant",
"namespace": "exampleNamespace",
"name": "exampleIgniteFunction",
"className": "com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
"logTopic": "persistent://public/default/function-log-topic"
}]
}
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there
41. Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Option 2 - more advanced function CI/CD flow for reusable functions
42. Option 3 - more advanced function CI/CD flow for reusable functions with more decoupling from DB
Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
43. Build System Storage
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
WebHook Filter/Transform
44. Build System Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
AWS CodePipeline S3
Github Web Hook (1)
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
45. Github Web Hook
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
GCP Cloud Build
GCP IAM
(1)
Build System
Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
46. Filter/Transform
This was best done in Scala
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Security checking logic, such as package
vulnerability checks
Option 1 - Basic function CI/CD flow
Push for real-
time updates
Pull to get
all data
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
UI Tool
Server Sent Events (SSE’s)
WebHook
Download artifact to store in CouchDB
47. Option 2 - more advanced function CI/CD flow for reusable functions
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
48. Option 3 - more advanced function CI/CD flow for reusable functions with more decoupling from DB
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
51. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
52. Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
SNOW = Service Now
SNOW Request
Could be modified to use custom UI instead
User
55. Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
Add into single
JSON array of
function configs
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
56. Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Router
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
57. Save configs of what was created
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
58. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
59. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
60. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
61. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
63. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: [email protected]
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures