As the largest provider of Internet products and services in China, Tencent serves billions of users and over a million merchants—and these numbers are growing fast! Tencent’s enterprises generate a huge volume of financial transactions, placing a tremendous load on their billing service, which processes hundreds of millions of dollars in revenue each day.
Because Tencent had been unable to scale its current billing service to handle its rapidly growing business, the possibility of data loss had become an escalating concern. To ensure data consistency, the company decided to redesign its system’s transaction processing pipeline. After evaluating the pros and cons of several messaging systems, Tencent chose to implement its billing service using Apache Pulsar. As a result, Tencent can now run their billing service on a very large scale with virtually no data loss.
In this talk, Ningguo Chen, the Chief Architect from Tencent Billing will share their journal of adopting Pulsar in their core transaction processing engine to process tens of billions of events every day. He will also discuss the problems they have encountered in using Pulsar and the improvements they have made for meeting their scale.
1 of 15
More Related Content
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Efficiently_Ningguo Chen
1. How Apache Pulsar Helps
Tencent Process Tens of Billions
of Transactions Efficiently
2. Self-Introduction
• Ningguo Chen, Lead Architect.
• Joined Tencent in 2008. Lead Architect of
Tencent Midas Billing Platform. Leads and
participates in building Tencent QB E-shop,
Tencent Midas Standard/Enterprise/Oversea
Editions, etc, facilitating Tencent Midas
becomes an all-round one-stop global billing
platform.
• Professional in Billing and Transaction system.
Owner of 20+ patents. Mainly focused on
providing billing services of high stability,
efficiency and security for Tencent’s online
and offline business.
5. Pain Points of Billing platform
• High consistency of transaction, ensuring the consistency of payments and goods
delivery.
-Failure and timeout: A single payment of Midas often involves many internal and external systems. This
leads to longer call chains and more exceptions, especially in network timeouts (e.g. overseas payment
services), DB manipulation, goods delivery Failures.
• High data reliability crossing regions
• Oceans of timed processing
- subscription & recurring billing, in-time account reconciliation
• High Performance
• High Scalability
- automatically scales elastically according to the business scope
6. Our Solution
• TDXA+TDMQ
TDXA, a framework solution for transaction control. Ensuring high consistency of transactions, high fault
tolerance, high scalability.
TDMQ,based on Pulsar,distributed Message Queue. EnablesTDXA to deal with messaging with
high consistency and availability, and makes convergence of various exceptions with the state table.
TDXA
TDMQ
7. Deployment of Tencent billing system
…
TDSQL
…
TDSQL
TDXA TDXA TDXA
App App App
MQ MQ
Wechat
Pay
Wechat
Pay
bank
Shenzhen Shanghai Hongkong
TDSQL
MQ’s Achievements:
1. Asynchronous data
transmission
between sytems
2. System Exception
handling
3. High data
consistency
3
1
2
8. Our Requirements for MQ
• Highly-consistency and highly-availability across regions
• Massive storage, able to manage massive delayed messages and
massive topics
• Highly concurrent consumptions (the number of consumers is m
uch larger than the number of queues)
• High Scalability
• Support message modification and more protocols, such as Restf
ul, AMQP
Pulsar perfectly supports the first 4
points. That’s why we choose Pulsar
to be our infrastructure.
9. Geo Replication
• Cross City
Shanghai Shenzhen
Shanghai
(China)
Toronto
(Canada)
broker
bookie
• Cross Region/State
2 replica up to 560k QPS
3 replica up to 360k QPS
3 geo replica up to 280k QPS
Compare to Kafka:
high throughput(async): up to 1million QPS
2 replica(sync): only 60k QPS
Benchmark Hardware: cpu 24cores memory 48G network 10G
10. n Base on database binlog, hash method
n Batch compression
n Conflict detection,ensure correctness
Result:
Data Replication 10w/s,Latency ~30ms, from Shenzhen to shanghai
Practice of DB replication across cities
12. Real-time Interaction
• Question: use MQ as a RPC?
• Answer: add Request-Reply mode
Product Consume
Msg
Rsp(+data)
Msg
Rsp(+data)
Msg
ACK
Product(data)
ACK
Leader
Read-only
Session
A
Session
B
Caller Callee
Callee
transmit
Broker
13. Batch-Process
• Question: how to modify data when data errors happened?
• Answer: Offset Shadow + Priority Queue
consume
Offset Shadow
Data eror,
How to modify?
product
reproduct
Priority Queue
Correct Data
Offset Shadow, prevent incorrect data from being consumed
Priority Queue, prevent revised data from being consumed too late