Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.
#PulsarBeijingMeetup
1 of 37
More Related Content
Apache Pulsar at Yahoo! Japan
1. Apache Pulsar at Yahoo! JAPAN
Yahoo Japan Corporation
Nozomi Kurihara
Aug., 17th, 2019
2. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 2
Who am I?
Nozomi Kurihara
• Software engineer at Yahoo! JAPAN (April 2012 ~)
• Working on internal messaging platform using Apache Pulsar
• Committer of Apache Pulsar
• (Hobby: Board / video games!)
3. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Agenda
3
1. What is Apache Pulsar?
2. Why did Yahoo! JAPAN choose Apache Pulsar?
3. How does Yahoo! JAPAN use Apache Pulsar?
4. Future plans
4. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 4
What is Apache Pulsar?
5. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 5
Apache Pulsar
Flexible pub-sub system backed by durable log storage
▪ History:
› 2014 Development started at Yahoo! Inc.
› 2015 Available in production in Yahoo! Inc.
› Sep. 2016 Open-sourced (Apache License 2.0)
› June 2017 Moved to Apache Incubator Project
› Sep. 2018 Graduated as Top Level Project!
▪ Users:
› Verizon media (Yahoo! Inc.)
› Comcast
› The Weather Channel
› Mercado Libre
› Streamlio
› Yahoo! JAPAN
etc.
▪ Competitors:
› Apache Kafka
› RabbitMQ
› Apache ActiveMQ
› Apache RocketMQ
etc.
6. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 6
Pub-Sub messaging
Message transmission from one system to another via Topic
▪ Producers publish messages to Topics
▪ Consumers receive only messages from Topics to which they subscribe
▪ Decoupled (no need to know each other) → asynchronous, scalable, resilient
TopicProducer
Consumer 1
Consumer 2
Consumer 3
Pub-Sub system
message
(log, notification, etc.)
Publish
Subscribe
7. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 7
Architecture
Producer Consumer
Broker 1 Broker 2 Broker 3
Bookie
1
Local ZK
Bookie
2
Bookie
3
Pulsar Cluster
Configuration
Store
(Global ZK)
■3 components:
‣ Broker
‣ Bookie
‣ ZooKeeper
8. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 8
Architecture - Broker
■Broker
‣ Serving node for clients’ requests
‣ No data locality (stateless)
Producer Consumer
Broker 1 Broker 2 Broker 3
Bookie
1
Local ZK
Bookie
2
Bookie
3
Pulsar Cluster
Configuration
Store
(Global ZK)
11. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 11
Why did Yahoo! JAPAN choose Apache Pulsar?
12. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 12
Yahoo! JAPAN
https://www.yahoo.co.jp/
13. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 13
Yahoo! JAPAN – 3 numbers
100+ 150,000+ 93,000,000+
image: aflo
Unique Browsers
(avg in 2018/7-9)
servers
(real)
services
14. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 14
Why did Yahoo! JAPAN choose Pulsar?
▪ Large number of customers
▪ Large number of services
▪ Sensitive/mission-critical messages
▪ Multiple data centers
→ High performance & scalability
→ Multi-tenancy
→ Durability
→ Geo-replication
Pulsar meets all these requirements!
15. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 15
Scalability
Just adding Brokers/Bookies increases serving/storage capacity!
(no special operation e.g. data rebalancing is required)
Producer Consumer
Broker 1 Broker 2 Broker 3
Bookie 1
Local ZK
Bookie 2 Bookie 3
Pulsar Cluster
Configuration
Store
(Global ZK)
Broker X
for more serving capacity
for more storage capacity
Bookie Y
16. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 16
Multi-tenancy
Multiple services can share one Pulsar system
▪ Just use Pulsar as a “Tenant” → no need to maintain own messaging system
▪ Authentication/Authorization mechanism protects messages from interception
ProducerService A Consumer
Producer Consumer
Producer Consumer
Producer Consumer
Topic A
Topic B
Topic C
Topic D
Service B
Authentication/Authorization
blocks unauthorized access
Service C
Service D
Pulsar System
17. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 17
Geo-replication
Producer Topic
Pulsar can replicate messages to another cluster
1. Producers only have to publish messages to Pulsar in the same data center
2. Pulsar asynchronously replicates messages to another cluster
3. Consumers can receive messages from the same data center
Pulsar Cluster A
Consumer
Consumer
Consumer
Data center A
Topic Consumer
Consumer
Consumer
Data center B
Geo-replication
Pulsar Cluster B
18. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 18
How does Yahoo! JAPAN use Apache Pulsar?
19. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
East
Broker
Bookie ZK
WebSocket
Proxy
19
System architecture in Yahoo! JAPAN
Service B
(Java)
Service A
(Node.js)
West
Broker
Bookie ZK
WebSocket
Proxy
Geo-replication
Service C
(C++)
Prometheus
+
Grafana
Collect metrics
+
Visualize
For each cluster:
・20 WSs
・15 Brokers
・10 Bookies
・5 ZKs
20. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 20
Users
More and more services start to use Pulsar!
• 210+ tenants
• 4000+ topics
• ~100K publishes/s
• ~180K subscribes/s
Typical use cases:
• Notification
• Job queueing
• Log pipeline
21. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 21
Case 1 – Notification of contents update
▪ Various contents files pushed from partner companies to Yahoo! JAPAN
▪ Notification sent to topic when contents are updated
▪ Once services receive notification, they then fetch contents from file server
Producer
Consumer
Topic
Service A
Pulsar
①send notification
③fetch content files
Consumer
Service B
Consumer
Service CPartner
Companies
weather, map, news etc.
FTP server
ftpd
②receive notification
22. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 22
Case 2 – Job queuing in mail service
▪ Indexing of mail can be heavy → you can execute it asynchronously
▪ Producers register jobs to Pulsar
▪ Consumers take jobs from Pulsar at their own pace
Producer
Consumer
Producer
Topic Handler for indexing
Mail BE server
Mail BE server
Pulsar
request
Register a job
Re-register if it fails
Take and process a job
23. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 23
Case 3 – Log pipeline
▪ Publisher: computing platforms on which YJ applications are running
▪ Subscriber: data platforms (monitoring, analyzing, storing etc.)
Pulsar
app2 app3 …
PaaS
container1
…
CaaS
container2
app1
…
Monitoring
Analyzing
Storing
PaaS_logs
CaaS_logs
Computing PFs Data PFs
…
logs
Service developers
deploy apps
check logs
24. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 24
Case 3 – Log pipeline
Pulsar
app2 app3 …
PaaS
container1
…
CaaS
container2
app1
…
Monitoring
Analyzing
Storing
PaaS_logs
CaaS_logs
Computing PFs Data PFs
…
Logs can have destinations
→ Consumers need to filter them
To:
Monitoring
To:
Analyzing
Filtering
discard
Filtering
Filtering
25. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 25
Case 3‘ – Log pipeline + filtering (Future plan)
▪ Filtering on Pulsar side
▪ Pulsar Function is helpful to filter!
Pulsarapp2 app3 …
PaaS
container1
…
CaaS
container2
app1
…
Pulsar
Functions
…
Monitoring
Analyzing
Storing
PaaS_logs
CaaS_logs
For Monitoring
For Analyzing
For Storing
Computing PFs
Data PFs
Filtering
26. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 26
Migration from Kafka
▪ We have an internal FaaS system using Apache OpenWhisk
▪ Problem: FaaS team had to maintain Apache Kafka
▪ Solution: migrate from Kafka to our internal Pulsar
▪ Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.)
<dependency>
- <groupId>org.apache.kafka</groupId>
- <artifactId>kakfa-clients</artifactId>
- <version>0.10.2.1</version>
+ <groupId>org.apache.pulsar</groupId>
+ <artifactId>pulsar-client-kafka</artifactId>
+ <version>2.4.0</version>
</dependency>
27. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 27
Future plans
28. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 28
Node.js Client
Node.js users can easily use Pulsar!
Implementation:
• https://github.com/apache/pulsar-client-node
• Based on C++ Client
Done:
✅ basic functionalities(producer, consumer, reader)
✅ test codes
✅ performance scripts
Todo:
• publish to npm registry
• Fix release flow
• support more features (multi-topic consume, unack etc.)
29. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 29
Admin WebUI (under development)
Administrators can easily and intuitively manage Pulsar topics!
Implementation
• https://gist.github.com/massakam/8e9bd3ca62874f18cf3ce3ecb6db1473
• Vue.js + Express
Done:
✅ basic pages (tenants, namespaces, topics etc.)
Todo:
• open repository
• advanced commands (unload, skip-messages etc.)
• authentication to Broker
34. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 34
Looking for the best naming!
pulsar-ui?
pulsar-console?
pulsar-manager?
35. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 35
Conclusion
36. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Conclusion
36
▪ Apache Pulsar is fast, durable, scalable, multi-tenant messaging platform, has useful
built-in features like geo replication and Pulsar Functions etc.
▪ Yahoo! JAPAN uses it as a centralized platform for various services
▪ Node.js Client is already open-sourced and Admin UI will come soon
Welcoming your any contributions,
because it’s an OPEN-SOURCE!