This document discusses querying Apache Pulsar streams using Apache Flink. It provides an overview of Pulsar and how it can be used for pub/sub messaging and streaming applications. It then describes how Flink connectors allow querying Pulsar streams through Flink's streaming and table APIs, including support for Pulsar schemas, exactly-once processing, and integration with the Pulsar catalog. The integration provides a unified data processing stack of Pulsar for streaming data storage and Flink for querying and analysis.
6. A brief history of Apache Pulsar
❏ 2012: Pulsar idea started at Yahoo!
❏ 5+ years on production, 100+ applications, 10+ data centers
❏ 2016/09 Yahoo open sourced Pulsar
❏ 2017/06 Yahoo donated Pulsar to ASF
❏ 2018/09 Pulsar graduated as a Top-Level project
❏ 25+ committers, 168 contributors, 1000+ forks, 4200+ stars
❏ Yahoo!, Yahoo! Japan, Tencent, Zhaopin, THG, OVH, …
http://pulsar.apache.org/en/powered-by/
45. Key_Shared Subscription
❏ Key based ordering
❏ Key can be message key or a separated *order* key
❏ HashRing based routing
❏ Key based batcher
❏ Policies for messages without *keys*
https://github.com/apache/pulsar/wiki/PIP-34:-Add-new-subscribe-type-Key_shared
46. Conclusion
❏ Apache Pulsar is a cloud-native streaming data storage
❏ Two levels of reading API: Pub/Sub + Segment
❏ Structured Event Streams via Pulsar Schema
❏ Pulsar is the unified data storage for Flink
❏ Pulsar + Flink for streaming-first, unified data processing stack
48. Pulsar at Europe
❏ First Pulsar Meetup at Paris
(@OVHCloud) on Friday 10/11
❏ https://www.meetup.com/Hadoop-U
ser-Group-France/events/26492044
7/
❏ If you are looking for collaborations
on Pulsar events, talk to us :-)