This document summarizes 5 years of operating a large-scale globally replicated Pulsar installation at Verizon Media. It discusses how the installation scaled up from 1 tenant and 2 data centers in 2015 to over 100 tenants and 6 data centers in 2020. It also covers the evolution of hardware used for brokers and bookkeepers, metrics and monitoring, deployment processes, broker isolation policies, storage utilization, and rack awareness in BookKeeper.
1 of 23
More Related Content
Five years of operating a large scale globally replicated Pulsar installation — Francis&Ludwig Pummer
3. Agenda
3
1. Focus & Use Cases
2. Scaling up
3. Provisioning and Capacity
4. Hardware Evolution
5. JVM GC Experiences
6. Metrics and Monitoring
7. Deployment
8. Broker Isolation Policies
9. BookKeeper Storage Utilization
10. BookKeeper Rack Awareness
4. Our Focus
● Operate a hosted pub-sub service within VMG
○ open-sourced as Pulsar
● Global presence
○ 6 DC (Asia, Europe, US)
○ full mesh replication
● Business critical use cases
○ Serving use cases
○ Lower latency bus for other low latency service
○ Write availability
4
5. Use Cases
● Application integration
○ Server-to-server control, status, notification messages
● Persistent queue
○ Buffering, feed ingestion, task distribution
● Message bus for large scale data stores
○ Durable log
○ Replication within and across geo-locations
5
7. More deliveries (increase fanout) → Add Brokers
More publishes → Add Bookies, Brokers
More storage → Add Bookies
Massively more topics → Add Clusters (SuperCluster)
See PIP 8 for SuperCluster (peer clusters)
7
Scaling up a Cluster
8. 8
Storage & I/O auto-balancing
wps by namespace
wps by bookkeeper
automatic distribution
Out for Reimage In after Reimage
9. Provisioning Model
9
New tenant provides:
● Average message size
● Peak publishes per second
● Steady-state deliveries per second (fan out)
● Per cluster/DC
● Tenants: x509 principal: Athenz
○ https://www.athenz.io/
○ Open source platform for X.509 cert based service authentication and authorization
Calculate:
● Broker messages/sec
● Broker bandwidth
● Bookie MB/sec
11. Hardware Evolution: BookKeepers
11
Pre-2015
12-core
32GB RAM
1G NIC
12 x 300GB 15K RPM SAS
drives
(2 x HW RAID-10 of 6 drives)
2015
12-core
64GB RAM
10G NIC
10 x 4TB 7.2k RPM SATA
2 x 120GB SSD
(1 x HW RAID-10 of 10 drives
1 x HW RAID-1 of 2 SSDs)
2020
36-core
192GB RAM
25G NIC
4 x 4TB NVMe
2 x 128GB Optane Persistent
Memory