This document discusses Apache Pulsar usage in Zhaopin and some key features:
1. It provides an overview of how Pulsar is used in Zhaopin and the increasing message throughput over time.
2. It describes several Pulsar features in detail, including key-shared subscriptions, schema versioning, HDFS offloading, and upcoming topics like policies and sticky consumers.
3. It discusses the Pulsar community contributions from the Zhaopin team, including details on key-shared subscriptions, schema version handling, HDFS offloader storage, and other improvements.
1 of 37
More Related Content
How Zhaopin contributes to Pulsar community
2. PHOTO
Zhaopin in Pulsar community
Penghui Li 李李鹏辉
Messaging platform leader in zhaopin.com
Apache Pulsar Committer
4. Apache Pulsar in zhaopin.com
2018/08
First service for online
applications
2018/10
1 billion / day
2019/02
6 billion / day
2019/08
20 billion / day
50+ Namespaces
5000+ Topics
5. 1. Features of zhaopin contributing to the community
2. Details of Key_shared subscription
3. Release Pulsar
4. Details of Pulsar multiple schema version
5. Details of HDFS Offloader
23. 2.4.0 Release
1. New branch and tag
2. Stage release (check -> sign -> stage)
3. Move master to new version and write release notes
4. Start vote
5. Promote release and publish
6. Update site and announce the release
24. PHOTO
Schema versioning & HDFS offloader
Bo Cong 丛搏
Message platform engineer in zhaopin.com
Apache Pulsar contributor
25. The meaning of multi-version schema
message 1 message 2 message 3 message 4 message 5
Message's schema is not immutable
version 0 version 1 version 2 version 3 version 4schema
26. Problems caused by version changes
Class Person {
Int id;
}
Class Person {
Int id;
@AvroDefault(""Zhang San"")
String name;
}
Class Person {
Int id;
String name;
}
Version 0
Version 2
Version 1
Can read
Can readCan’t read
27. Change in compatibility policy
Back Ward
Back Ward Transitive
version 2 version 1 version 0
can read can read
version 2 version 1 version 0
can read can read
can read
28. Schema creation process
admin client api
admin rest api
producer create
consumer subscribe
schema data
SchemaRegistryService
new schema
old schema
version
compatibility check
Incompatible
version
29. Multi-version use in pulsar Avro schema
message 1
version 0
message 2
version 1
message 3
version 2
version 3
consumer
30. SchemaInfoProvider
Message
exist
new AvroReader()
Multi-version use in pulsar Avro schema
new ReflectDatumReader<>(writerSchema, readerSchema)
ReaderCache
Version0
read
not exist
find schema by version 0
from broker
read
If the read and write schema is
different in the Avro schema, the
reader needs to generate the
corresponding read and write
schema.
31. Multi-version use in pulsar Auto consume schema
only support AvroSchema and JsonSchema
GenericAvroRecord GenericJsonRecord
getField
unlike JsonSchema or AvroSchema, the reader only needs writerSchema.
Consumer<GenericRecord> consumer = client
.newConsumer(Schema.AUTO_CONSUME())
.topic("test")
.subscriptionName("test")
.subscribe();
32. The use of schema definition
Class Person {
@Nullable
String name;
}
SchemaDefinition<Person> schemaDefinition =
SchemaDefinition.<Person>builder()
.withAlwaysAllowNull(false)
.withPojo(Person.class).build();
Producer<Person> producer = null;
producer = client
.newProducer(Schema.AVRO(schemaDefinition))
.topic("persistent://public/default/test")
.create();
33. Why do we need HDFS offloader
Bookeeper HDFS
ManagedLedger
(Broker)
•Cold and Heat Data Separation
SSD
HDD
High throughput
Low latency Massive data storage
34. Offload topic ledgers to HDFS
stored relative path
tenant/namespace/topic/ledgerId + "-" + uuid.toString()
topic
ledger 1
ledger 2
ledger 3
index
data
index
data
index
data
35. HDFS Offloader storage structure
•Storage mode use org.apache.hadoop.io.MapFile
Index
Data entryID entryData entryID entryData entryID entryData
entryID entryID entryID entryID entryID entryID entryID