SlideShare a Scribd company logo
Preview of Apache Pulsar 2.5.0
Transactional streaming
Sticky consumer
Batch receiving
Namespace change events
Messaging semantics - 1
1. At least once
try {
Message msg = consumer.receive()
// processing
consumer.acknowledge(msg)
} catch (Exception e) {
consumer.negativeAcknowledge(msg)
}
try {
Message msg = consumer.receive()
// processing
} catch (Exception e) {
log.error(“processing error”, e)
} finally {
consumer.acknowledge(msg)
}
2. At most once
3. Exactly once ?
Messaging semantics - 2
idempotent produce and idempotent consume be used more in practice
Messaging semantics - 3
Effectively once
ledgerId + messageId -> sequenceId
+
Broker deduplication
Messaging semantics - 4
Limitations in effectively once
1. Only works with one partition producing
2. Only works with one message producing
3. Only works with on partition consuming
4. Consumers are required to store the message id and state for restoring
Streaming processing - 1
ATopic-1 Topic-2f (A) B
1
1. Received message A from Topic-1 and do some processing
Streaming processing - 2
ATopic-1 Topic-2f (A) B
2
2. Write the result message B to Topic-2
Streaming processing - 3
ATopic-1 Topic-2f (A) B
3
3. Get send response from Topic-2
How to handle get response timeout or consumer/function crash?
Ack message A = At most once
Nack message A = At least once
Streaming processing - 4
ATopic-1 Topic-2f (A) B4
4. Ack message A
How to handle ack failed or consumer/function crash?
Transactional streaming semantics
1. Atomic multi-topic publish and acknowledge
2.Message only dispatch to one consumer until transaction abort
3.Only committed message can be read by consumer
READ_COMMITTED
https://github.com/apache/pulsar/wiki/PIP-31%3A-Transaction-Support
Transactional streaming demo
Message<String> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Sticky consumer
Sticky consumer
https://github.com/apache/pulsar/wiki/PIP-34%3A-Add-new-subscribe-type-Key_shared
Consumer consumer1 = client.newConsumer()
.topic(“my-topic“)
.subscription(“my-subscription”)
.subscriptionType(SubscriptionType.Key_Shared)
.keySharedPolicy(KeySharedPolicy.sticky()
.ranges(Range.of(0, 32767)))
).subscribe();
Consumer consumer2 = client.newConsumer()
.topic(“my-topic“)
.subscription(“my-subscription”)
.subscriptionType(SubscriptionType.Key_Shared)
.keySharedPolicy(KeySharedPolicy.sticky()
.ranges(Range.of(32768, 65535)))
).subscribe();
Batch receiving messages
Consumer consumer = client.newConsumer()
.topic(“my-topic“)
.subscription(“my-subscription”)
.batchReceivePolicy(BatchReceivePolicy.builder()
.maxNumMessages(100)
.maxNumBytes(2 * 1024 * 1024)
.timeout(1, TimeUnit.SECONDS)
).subscribe();
Messages msgs = consumer.batchReceive();
// doing some batch operate
https://github.com/apache/pulsar/wiki/PIP-38%3A-Batch-Receiving-Messages
Namespace change events
https://github.com/apache/pulsar/wiki/PIP-39%3A-Namespace-Change-Events
persistent://tenant/ns/__change_events
class PulsarEvent {
EventType eventType;
ActionType actionType;
TopicEvent topicEvent;
}
Thanks
Penghui Li
Bo Cong / 丛搏
Pulsar Schema
智联招聘消息系统研发⼯程师
Pulsar schema、HDFS Offload 核⼼贡献者
Schema Evolution
2
Data management can't escape the evolution of schema
Single version schema
3
message 1 message 2 message 3
version 1
Multiple version schemas
4
message 1 message 2 message 3
version 1 version 2 Version 3
Schema compatibility
can read Deserialization=
Compatibility strategy evolution
Back Ward
Back Ward Transitive
version 2 version 1 version 0
version 2 version 1 version 0
can read can read
can read can read
can read
may can’t read
Evolution of the situation
7
Class Person {
@Nullable
String name;
}
Version 1
Class Person {
String name;
}
Class Person {
@Nullable
@AvroDefault(""Zhang San"")
String name;
} Version 2
Version 3
Can read
Can readCan’t read
Compatibility check
Separate schema compatibility checker for producer and consumer
Producer Check if exist
Consumer
isAllowAutoUpdateSchema = false
Upgrade way
BACKWORD
Different strategy with different upgrade way
BACKWORD_TRANSITIVE
FORWORD
FORWORD_TRANSITIVE
Full
Full_TRANSITIVE
Consumers
Producers
Any order
Produce Different Message
10
Producer<V1Data> p = pulsarClient.newProducer(Schema.AVRO(V1Data.class))
.topic(topic).create();
Consumer<V2Data> c = pulsarClient.newConsumer(Schema.AVRO(V2Data.class))
.topic(topic)
.subscriptionName("sub1").subscribe()
p.newMessage().value(data1).send();
p.newMessage(Schema.AVRO(V2Data.class)).value(data2).send();
p.newMessage(Schema.AVRO(V1Data.class)).value(data3).send();
Message<V2Data> msg1 = c.receive();
V2Data msg1Value = msg1.getValue();
Message<V2Data> msg2 = c.receive();
Message<V2Data> msg3 = c.receive();
V2Data msg3Value = msg3.getValue();
Thanks
Bo Cong
翟佳
Kafka On Pulsar(KOP)
What is Apache Pulsar?
Flexible Pub/Sub
Messaging
backed by Durable
log Storage
Barrier for user?
Unified Messaging Protocol
Apps Build on old systems
How Pulsar handles it?
Pulsar Kafka Wrapper on Kafka Java API
https://pulsar.apache.org/docs/en/adaptors-kafka/
Pulsar IO Connect
https://pulsar.apache.org/docs/en/io-overview/
Kafka on Pulsar (KoP)
KoP Feasibility — Log
Topic
KoP Feasibility — Log
Topic
Producer Consumer
KoP Feasibility — Log
Topic
Producer Consumer
Kafka
KoP Feasibility — Log
Topic
Producer Consumer
Pulsar
KoP Feasibility — Others
Producer Consumer
Topic Lookup
Produce
Consume
Offset
Consumption State
KoP Overview
Kafka lib
Broker
Pulsar
Consumer
Pulsar lib
Load
Balancer
Pulsar Protocol handler Kafka Protocol handler
Pulsar
Producer
Pulsar lib
Kafka
Producer
Kafka lib
Kafka
Consumer
Kafka lib
Kafka
Producer
Managed Ledger
BK Client
Geo-
Replicator
Pulsar Topic
ZooKeeper
Bookie
Pulsar
KoP Implementation
Topic flat map: Broker sets `kafkaNamespace`
Message ID and Offset: LedgerId + EntryId
Message: Convert Key/value/timestamp/headers(properties)
Topic Lookup: Pulsar admin topic lookup -> owner broker
Produce: Convert, then call PulsarTopic.publishMessage
Consume: Convert, then call non-durable-cursor.readEntries
Group Coordinator: Keep in topic `public/__kafka/__offsets`
KoP Now
Layered Architecture
Independent Scale
Instant Recovery
Balance-free expand
Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a
single partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed
applications
Unified messaging
model
Support both
Streaming and
Queuing
Delivery Guarantees
At least once, at most
once and effectively once
Low Latency
Low publish latency of
5ms
Highly scalable &
available
Can support millions of
topics
HA
KoP Now
Demo
https://kafka.apache.org/quickstart
Demo1: Kafka Producer / Consumer
Demo2: Kafka Connect
https://archive.apache.org/dist/kafka/2.0.0/
kafka_2.12-2.0.0.tgz
Demo video: https://www.bilibili.com/video/av75540685
Demo
Kafka lib
Broker
Pulsar
Consumer
Pulsar lib
Load
Balancer
Pulsar Protocol handler Kafka Protocol handler
Pulsar
Producer
Pulsar lib
Kafka
Producer
Kafka lib
Kafka
Consumer
Kafka lib
Kafka
Producer
Managed Ledger
BK Client
Geo-
Replicator
Pulsar Topic
ZooKeeper
Bookie
Pulsar
Demo1: K-Producer -> K-Consumer
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Preview of Apache Pulsar 2.5.0
Demo1: P-Producer -> K-Consumer
Pulsar
Consumer
Pulsar lib
Pulsar
Producer
Pulsar lib
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/pulsar-client produce test -n 1 -m “Hello from Pulsar Producer, Message 1”
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Preview of Apache Pulsar 2.5.0
Demo1: P-Producer -> K-Consumer
Pulsar
Consumer
Pulsar lib
Pulsar
Producer
Pulsar lib
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
bin/pulsar-client consume -s sub-name test -n 0
Preview of Apache Pulsar 2.5.0
Demo2: Kafka Connect
Demo2: Kafka Connect
Kafka lib
Kafka
File
Source
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
InPut
File
Kafka
File
Sink
OutPut
File
TOPIC
bin/connect-standalone.sh 

config/connect-standalone.properties 

config/connect-file-source.properties 

config/connect-file-sink.properties
Demo2: Pulsar Functions
https://pulsar.apache.org/docs/en/functions-overview/
Demo2: Pulsar Functions
Kafka lib
Kafka
File
Source
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
InPut
File
Kafka
File
Sink
OutPut
File
TOPIC
Kafka lib
Pulsar
Functions
OutPut Topic
bin/pulsar-admin functions localrun --name pulsarExclamation

--jar pulsar-functions-api-examples.jar 

--classname org…ExclamationFunction

--inputs connect-test-partition-0 --output out-hello
Preview of Apache Pulsar 2.5.0
Apache Pulsar & Apache Kafka
Thanks!Stream
Native
We are hiring
Thanks

More Related Content

Preview of Apache Pulsar 2.5.0

  • 1. Preview of Apache Pulsar 2.5.0 Transactional streaming Sticky consumer Batch receiving Namespace change events
  • 2. Messaging semantics - 1 1. At least once try { Message msg = consumer.receive() // processing consumer.acknowledge(msg) } catch (Exception e) { consumer.negativeAcknowledge(msg) } try { Message msg = consumer.receive() // processing } catch (Exception e) { log.error(“processing error”, e) } finally { consumer.acknowledge(msg) } 2. At most once 3. Exactly once ?
  • 3. Messaging semantics - 2 idempotent produce and idempotent consume be used more in practice
  • 4. Messaging semantics - 3 Effectively once ledgerId + messageId -> sequenceId + Broker deduplication
  • 5. Messaging semantics - 4 Limitations in effectively once 1. Only works with one partition producing 2. Only works with one message producing 3. Only works with on partition consuming 4. Consumers are required to store the message id and state for restoring
  • 6. Streaming processing - 1 ATopic-1 Topic-2f (A) B 1 1. Received message A from Topic-1 and do some processing
  • 7. Streaming processing - 2 ATopic-1 Topic-2f (A) B 2 2. Write the result message B to Topic-2
  • 8. Streaming processing - 3 ATopic-1 Topic-2f (A) B 3 3. Get send response from Topic-2 How to handle get response timeout or consumer/function crash? Ack message A = At most once Nack message A = At least once
  • 9. Streaming processing - 4 ATopic-1 Topic-2f (A) B4 4. Ack message A How to handle ack failed or consumer/function crash?
  • 10. Transactional streaming semantics 1. Atomic multi-topic publish and acknowledge 2.Message only dispatch to one consumer until transaction abort 3.Only committed message can be read by consumer READ_COMMITTED https://github.com/apache/pulsar/wiki/PIP-31%3A-Transaction-Support
  • 11. Transactional streaming demo Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get();
  • 13. Sticky consumer https://github.com/apache/pulsar/wiki/PIP-34%3A-Add-new-subscribe-type-Key_shared Consumer consumer1 = client.newConsumer() .topic(“my-topic“) .subscription(“my-subscription”) .subscriptionType(SubscriptionType.Key_Shared) .keySharedPolicy(KeySharedPolicy.sticky() .ranges(Range.of(0, 32767))) ).subscribe(); Consumer consumer2 = client.newConsumer() .topic(“my-topic“) .subscription(“my-subscription”) .subscriptionType(SubscriptionType.Key_Shared) .keySharedPolicy(KeySharedPolicy.sticky() .ranges(Range.of(32768, 65535))) ).subscribe();
  • 14. Batch receiving messages Consumer consumer = client.newConsumer() .topic(“my-topic“) .subscription(“my-subscription”) .batchReceivePolicy(BatchReceivePolicy.builder() .maxNumMessages(100) .maxNumBytes(2 * 1024 * 1024) .timeout(1, TimeUnit.SECONDS) ).subscribe(); Messages msgs = consumer.batchReceive(); // doing some batch operate https://github.com/apache/pulsar/wiki/PIP-38%3A-Batch-Receiving-Messages
  • 17. Bo Cong / 丛搏 Pulsar Schema 智联招聘消息系统研发⼯程师 Pulsar schema、HDFS Offload 核⼼贡献者
  • 18. Schema Evolution 2 Data management can't escape the evolution of schema
  • 19. Single version schema 3 message 1 message 2 message 3 version 1
  • 20. Multiple version schemas 4 message 1 message 2 message 3 version 1 version 2 Version 3
  • 21. Schema compatibility can read Deserialization=
  • 22. Compatibility strategy evolution Back Ward Back Ward Transitive version 2 version 1 version 0 version 2 version 1 version 0 can read can read can read can read can read may can’t read
  • 23. Evolution of the situation 7 Class Person { @Nullable String name; } Version 1 Class Person { String name; } Class Person { @Nullable @AvroDefault(""Zhang San"") String name; } Version 2 Version 3 Can read Can readCan’t read
  • 24. Compatibility check Separate schema compatibility checker for producer and consumer Producer Check if exist Consumer isAllowAutoUpdateSchema = false
  • 25. Upgrade way BACKWORD Different strategy with different upgrade way BACKWORD_TRANSITIVE FORWORD FORWORD_TRANSITIVE Full Full_TRANSITIVE Consumers Producers Any order
  • 26. Produce Different Message 10 Producer<V1Data> p = pulsarClient.newProducer(Schema.AVRO(V1Data.class)) .topic(topic).create(); Consumer<V2Data> c = pulsarClient.newConsumer(Schema.AVRO(V2Data.class)) .topic(topic) .subscriptionName("sub1").subscribe() p.newMessage().value(data1).send(); p.newMessage(Schema.AVRO(V2Data.class)).value(data2).send(); p.newMessage(Schema.AVRO(V1Data.class)).value(data3).send(); Message<V2Data> msg1 = c.receive(); V2Data msg1Value = msg1.getValue(); Message<V2Data> msg2 = c.receive(); Message<V2Data> msg3 = c.receive(); V2Data msg3Value = msg3.getValue();
  • 29. What is Apache Pulsar? Flexible Pub/Sub Messaging backed by Durable log Storage
  • 30. Barrier for user? Unified Messaging Protocol Apps Build on old systems
  • 31. How Pulsar handles it? Pulsar Kafka Wrapper on Kafka Java API https://pulsar.apache.org/docs/en/adaptors-kafka/ Pulsar IO Connect https://pulsar.apache.org/docs/en/io-overview/
  • 33. KoP Feasibility — Log Topic
  • 34. KoP Feasibility — Log Topic Producer Consumer
  • 35. KoP Feasibility — Log Topic Producer Consumer Kafka
  • 36. KoP Feasibility — Log Topic Producer Consumer Pulsar
  • 37. KoP Feasibility — Others Producer Consumer Topic Lookup Produce Consume Offset Consumption State
  • 38. KoP Overview Kafka lib Broker Pulsar Consumer Pulsar lib Load Balancer Pulsar Protocol handler Kafka Protocol handler Pulsar Producer Pulsar lib Kafka Producer Kafka lib Kafka Consumer Kafka lib Kafka Producer Managed Ledger BK Client Geo- Replicator Pulsar Topic ZooKeeper Bookie Pulsar
  • 39. KoP Implementation Topic flat map: Broker sets `kafkaNamespace` Message ID and Offset: LedgerId + EntryId Message: Convert Key/value/timestamp/headers(properties) Topic Lookup: Pulsar admin topic lookup -> owner broker Produce: Convert, then call PulsarTopic.publishMessage Consume: Convert, then call non-durable-cursor.readEntries Group Coordinator: Keep in topic `public/__kafka/__offsets`
  • 40. KoP Now Layered Architecture Independent Scale Instant Recovery Balance-free expand
  • 41. Ordering Guaranteed ordering Multi-tenancy A single cluster can support many tenants and use cases High throughput Can reach 1.8 M messages/s in a single partition Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Streaming and Queuing Delivery Guarantees At least once, at most once and effectively once Low Latency Low publish latency of 5ms Highly scalable & available Can support millions of topics HA KoP Now
  • 42. Demo https://kafka.apache.org/quickstart Demo1: Kafka Producer / Consumer Demo2: Kafka Connect https://archive.apache.org/dist/kafka/2.0.0/ kafka_2.12-2.0.0.tgz Demo video: https://www.bilibili.com/video/av75540685
  • 43. Demo Kafka lib Broker Pulsar Consumer Pulsar lib Load Balancer Pulsar Protocol handler Kafka Protocol handler Pulsar Producer Pulsar lib Kafka Producer Kafka lib Kafka Consumer Kafka lib Kafka Producer Managed Ledger BK Client Geo- Replicator Pulsar Topic ZooKeeper Bookie Pulsar
  • 44. Demo1: K-Producer -> K-Consumer Kafka lib Kafka Consumer Kafka libKafka lib Kafka Producer Broker Pulsar Protocol handler Kafka Protocol handler Pulsar Topic bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
  • 46. Demo1: P-Producer -> K-Consumer Pulsar Consumer Pulsar lib Pulsar Producer Pulsar lib Kafka lib Kafka Consumer Kafka libKafka lib Kafka Producer Broker Pulsar Protocol handler Kafka Protocol handler Pulsar Topic bin/pulsar-client produce test -n 1 -m “Hello from Pulsar Producer, Message 1” bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
  • 48. Demo1: P-Producer -> K-Consumer Pulsar Consumer Pulsar lib Pulsar Producer Pulsar lib Kafka lib Kafka Consumer Kafka libKafka lib Kafka Producer Broker Pulsar Protocol handler Kafka Protocol handler Pulsar Topic bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test bin/pulsar-client consume -s sub-name test -n 0
  • 51. Demo2: Kafka Connect Kafka lib Kafka File Source Broker Pulsar Protocol handler Kafka Protocol handler Pulsar Topic InPut File Kafka File Sink OutPut File TOPIC bin/connect-standalone.sh 
 config/connect-standalone.properties 
 config/connect-file-source.properties 
 config/connect-file-sink.properties
  • 53. Demo2: Pulsar Functions Kafka lib Kafka File Source Broker Pulsar Protocol handler Kafka Protocol handler Pulsar Topic InPut File Kafka File Sink OutPut File TOPIC Kafka lib Pulsar Functions OutPut Topic bin/pulsar-admin functions localrun --name pulsarExclamation
 --jar pulsar-functions-api-examples.jar 
 --classname org…ExclamationFunction
 --inputs connect-test-partition-0 --output out-hello
  • 55. Apache Pulsar & Apache Kafka