-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize snapshot flow to only snapshot segments which have updates #13285
Merged
klsince
merged 7 commits into
apache:master
from
tibrewalpratik17:optimize_snapshotting
Jun 11, 2024
Merged
Optimize snapshot flow to only snapshot segments which have updates #13285
klsince
merged 7 commits into
apache:master
from
tibrewalpratik17:optimize_snapshotting
Jun 11, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13285 +/- ##
============================================
+ Coverage 61.75% 62.11% +0.35%
+ Complexity 207 198 -9
============================================
Files 2436 2548 +112
Lines 133233 139979 +6746
Branches 20636 21735 +1099
============================================
+ Hits 82274 86941 +4667
- Misses 44911 46447 +1536
- Partials 6048 6591 +543
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…nce last snapshot
tibrewalpratik17
force-pushed
the
optimize_snapshotting
branch
from
June 6, 2024 12:48
e30186b
to
e378495
Compare
klsince
reviewed
Jun 6, 2024
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Outdated
Show resolved
Hide resolved
deemoliu
requested changes
Jun 6, 2024
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
tibrewalpratik17
force-pushed
the
optimize_snapshotting
branch
from
June 6, 2024 23:27
d5ca6cb
to
78f2a2a
Compare
tibrewalpratik17
commented
Jun 7, 2024
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Show resolved
Hide resolved
klsince
reviewed
Jun 7, 2024
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Show resolved
Hide resolved
klsince
approved these changes
Jun 11, 2024
...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java
Outdated
Show resolved
Hide resolved
klsince
reviewed
Jun 11, 2024
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Show resolved
Hide resolved
klsince
approved these changes
Jun 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
label:
optimization
enhancement
upsert
Change 1
Inspired from @klsince's work #12976.
This patch enhances the
doTakeSnapshot
flow to not snapshot all segments in a given partition but only the ones which have been updated since last-snapshot taken. This particularly improves scenarios where the number of segments per partition is high.doTakeSnapshot
workflow runs before a new consuming segment starts consumption and directly introduces ingestion lag before starting consumption.Change 2
This patch also reorders the
takeSnapshot
andremoveDeletedPrimaryKeys
flow putting the latter before the first in case ofdeletedKeysTTL
set. This way all the keys and validDocIDs that got removed inremoveDeletedPrimaryKeys
will be snapshotted immediately rather than one commit cycle later.We were seeing scenerios where the snapshot flow time taken went upto 30s in case of some tables.
Change 3
We enable snapshotting during server restart for partial-upsert tables before the first consuming segment. This was not done before with the assumption that not all segments are loaded but in case of partial-upsert tables we don't start consumption unless all data is loaded. This saves one segment commit cycle for snapshotting in case of enabling snapshots for tables or after server restart.
The below screenshot shows a dip after server restart and it takes one commit cyle to recover snapshots again.