1) Yahoo Japan uses Apache Pulsar as a centralized messaging platform connecting various internal services.
2) Pulsar is now being used to build a large scale log pipeline where computing platforms publish logs/metrics and monitoring platforms consume them.
3) This architecture leverages Pulsar to decouple producers and consumers, enabling scalability and resiliency across the platform.
1 of 35
More Related Content
Large scale log pipeline using Apache Pulsar_Nozomi
1. Large scale log pipeline using
Apache Pulsar
Yahoo Japan Corporation
Nozomi Kurihara
June, 18th, 2020
2. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 2
Who am I?
Nozomi Kurihara
• Software engineer at Yahoo! JAPAN (April 2012 ~)
• Working on internal messaging platform using Apache Pulsar
• Committer of Apache Pulsar
• (Hobby: Board / video games!)
3. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved.
Agenda
3
1. Apache Pulsar at Yahoo! JAPAN
- About Yahoo! JAPAN
- Why Pulsar was chosen
- Architecture and performance
- Use cases
2. Large scale log pipeline
4. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 4
Apache Pulsar at Yahoo! JAPAN
5. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 5
Yahoo! JAPAN
https://www.yahoo.co.jp/
6. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 6
Yahoo! JAPAN – 3 numbers
100+ 150,000+ 49,010,000+
image: aflo
login users per month
(2019/06)
servers
(real)
services
7. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 7
Pulsar at Yahoo! JAPAN
• We use Apache Pulsar as a centralized messaging platform for 3.5 years
• 1 Pulsar maintainer team and a lot of teams (services) use Pulsar as a “tenant”
Producer
Service A
Consumer
Producer Consumer
Producer Consumer
Topic B
Topic A
Pulsar team
Pulsar
Service B
Service C
Topic C
8. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 8
Pulsar at Yahoo! JAPAN - Users
More and more services start to use Pulsar!
• 270+ tenants
• 4400+ topics
• ~50K publishes/s
• ~150K consumes/s
Typical use cases:
• Notification
• Job queueing
• Log pipeline
9. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 9
Pulsar community in Japan
TechBlog
- https://techblog.yahoo.co.jp/entry/20200312818173/
- https://techblog.yahoo.co.jp/entry/20200413827977/
- https://techblog.yahoo.co.jp/entry/2020060330002394/
Apache Pulsar Meetup Japan (in Tokyo)
- https://japan-pulsar-user-group.connpass.com/
10. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 10
Why Pulsar was chosen
11. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 11
Why did Yahoo! JAPAN choose Pulsar?
Large number of customers
Large number of services
Sensitive/mission-critical messages
Multiple data centers
→ High performance & scalability
→ Multi-tenancy
→ Security & Durability
→ Geo-replication
Pulsar meets all requirements!
12. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 12
Multi-tenancy
Share 1 Pulsar with all YJ services → low hardware and labor costs
Service A
MQ ConsumerProducer
Service B
MQ ConsumerProducer
Service C
MQ ConsumerProducer
Service A
topic ConsumerProducer
Service B
topic ConsumerProducer
Service C
topic ConsumerProducer
Pulsar team
13. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 13
Multi-tenancy – self-service
Users can create/configure/delete their topics by themselves
→ management of topics is delegated to users
Internal Web UI tool to manage topics (will be replaced with pulsar-manager):
Create tenant
Create namespace See topic stats
14. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 14
Architecture and performance
15. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved.
East
Broker
Bookie ZK
WebSocket
Proxy
15
Clusters in Yahoo! JAPAN
West
Broker
Bookie ZK
WebSocket
Proxy
Geo-replication
For each cluster:
• 20 WS proxies
• 15 Brokers
• 10 Bookies
• 5 ZKs
16. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 16
Performance – experimental settings
CPU Memory Disk NIC
Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT
Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1)
Ledger: SATA HDD 10TB x12(RAID1+0)
10GBaseT
• Pulsar version: 2.3.2(Broker) / 2.4.1(Client)
• Tool: openmessaging-benchmark
• Message size: 1 KB
• partition: 1, 16, 32
• rate(attempted): 100000, 500000
• Server spec:
17. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 17
Performance – experimental results
- 16, 32 partitions achieves 500,000 msg/s whereas 1 partition does not
- max publish rate with 1 partition looks 200,000 msg/s
18. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 18
Tuning example (Bookie)
Problem:
• More users increases, more writes to SSD
• That reduces lifespan of SSD (actually we saw frequent failure of SSD)
Solution:
Increase journalMaxGroupWaitMSec from 1 to 2
→ Write decreased by 30% at the sacrifice of the least latency
CPU Memory Disk NIC
Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT
Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1)
Ledger: SATA HDD 10TB x12(RAID1+0)
10GBaseT
19. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 19
Use cases
20. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 20
Case 1 – Notification of contents update
Various contents files pushed from partner companies to Yahoo! JAPAN
Notification sent to topic when contents are updated
Once services receive notification, fetch contents from file server
Producer
Consumer
Topic
Service A
Pulsar
①send notification
③fetch content files
Consumer
Service B
Consumer
Service CPartner
Companies
weather, map, news etc.
FTP server
ftpd
②receive notification
21. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 21
Case 2 – Job queuing in mail service
Asynchronously execute heavy jobs like indexing of mail
Producers register jobs to Pulsar
Consumers take jobs from Pulsar at their own pace
Producer
Consumer
Producer
Topic Handler for indexing
Mail BE server
Mail BE server
Pulsar
request
Register a job
Re-register if it fails
Take and process a job
22. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 22
Case 3 – Kafka alternative
We have an internal FaaS system using Apache OpenWhisk
Problem: FaaS team had to maintain Apache Kafka
Solution: migrate from Kafka to our internal Pulsar
Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.)
<dependency>
- <groupId>org.apache.kafka</groupId>
- <artifactId>kakfa-clients</artifactId>
- <version>0.10.2.1</version>
+ <groupId>org.apache.pulsar</groupId>
+ <artifactId>pulsar-client-kafka</artifactId>
+ <version>2.4.0</version>
</dependency>
23. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 23
Large scale log pipeline
24. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 24
Situation
…
Service developers
deploy
monitor
logs/
metrics
PaaS CaaSFaaS
25. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 25
Yamas
• Metrics monitoring / alerting platform (SaaS)
• Originally developed in Verizon media
• Will be open-sourced soon!
26. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 26
Scale
• Amount of total logs: 1.4~3.8 TB/h
• Peek traffics: 10+ Gbps
• Number of PFs will increase more and more
27. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 27
Legacy architecture
Computing PFs
app
PaaS…
…
Monitoring PFs
Splunk
Yamas
Yamas
agent
Splunk
agent
app
app
app
app
CaaS
Yamas
agent
Splunk
agent
app
app
app
L Need to install dedicated “agent” for each Monitoring PFs
L Difficult to scale out
L Traffic spikes directly influence Monitoring PFs
28. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 28
Motivation
Remove dedicated agent for each monitoring PF:
- No need specific knowledge and extra components
- Easier trouble shooting
Decouple sender/receiver PFs by introducing message queueing layer:
- Scalability
- Resiliency
29. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 29
New architecture
Computing PFs
app
PaaS…
…
Monitoring PFs
Splunk
Yamas
Splunk topic
app
app
app
app
CaaS
Pulsar
producer
app
app
app
Pulsar
Yamas topic
Pulsar
producer
Pulsar
consumer
Pulsar
consumer
J Single library
J Easy to scale out
J Traffic spikes are mitigated by queueing layer
30. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 30
Topic design – 3 patterns
PaaS
Pulsar
CaaS
PaaS
CaaS
Splunk
Yamas
①Producer-centric
②Consumer-centric
Messages are filtered/transformed at Consumer-side:
J Producers donʼt care about Consumers
L Consumers care about Producers
Splunk
Pulsar
Yamas
PaaS
CaaS
Splunk
Yamas
Messages are filtered/transformed at Producer-side:
J Consumers donʼt care about Producers
L Producers care about Consumers
③Function
Splunk
Pulsar
Yamas
PaaS
CaaS
Splunk
Yamas
Messages are filtered/transformed at Function-side:
J Both Producers and Consumers donʼt care about each other
L Extra loads: traffic, computing, storage etc.
PaaS
CaaS
func
31. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 31
Topic format and message format
{consumer_pf}/{region}/{message_type}-{num}
splunk/west/log-0
Pulsar (west)
yamas/west/metric-0
splunk/west/log-1
splunk/west/metric-0
……
splunk
yamas
…
west
east
log
metric
…
splunk/east/log-0
Pulsar (east)
yamas/east/metric-0
splunk/east/log-1
splunk/east/metric-0
………
{
"time": "2018-10-25T08:36:47.000Z",
"producer": "paas-producer.example.com",
"origin": "app.space.org.cluster.dc.nwseg",
"domain": "paas",
"body": {
"message": "hello splunk”,
…
}
}
Pulsar
producer
32. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 32
Use case: Pulsar stats on Yamas
YamasPulsar
Yamas topic
Pulsar
producer
/admin/v2/broker-stats/topics
33. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 33
Conclusion
34. Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved.
Conclusion
34
Conclusion:
• Yahoo! JAPAN uses Pulsar as a centralized platform for various services
• Recently we start to use Pulsar as a large scale log pipeline where
computing PFs publish their logs/metrics and monitoring PFs consume
• Pulsar plays an important role to connect various PFs and make whole
system scalable and resilient
Future plan:
• More Producer PFs and Consumer PFs
• Visualize SLI (message delivery rate, latency etc.)