-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Bug in Handling Equal Comparison Column Values in Upsert #12395
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12395 +/- ##
============================================
+ Coverage 61.56% 61.63% +0.06%
- Complexity 201 207 +6
============================================
Files 2426 2436 +10
Lines 132709 133117 +408
Branches 20519 20620 +101
============================================
+ Hits 81706 82050 +344
- Misses 44994 45027 +33
- Partials 6009 6040 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
ankitsultana
changed the title
[Do Not Merge] Fix Bug in Handling Equal Comparison Column Values in Upsert
Fix Bug in Handling Equal Comparison Column Values in Upsert
Feb 16, 2024
.../java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java
Outdated
Show resolved
Hide resolved
tibrewalpratik17
approved these changes
Feb 16, 2024
.../java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java
Show resolved
Hide resolved
deemoliu
reviewed
Feb 20, 2024
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Show resolved
Hide resolved
Jackie-Jiang
approved these changes
Feb 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See issue for explanation of the issue: #12398
This PR fixes the issue by de-duplicating the records before the map is updated. This adds a small cost in terms of memory, but it should be quite low (e.g. for 1M records, with 32 byte primary-keys, we will have 32MB + memory address and other Record info parameter costs).
There's a small latency cost also added due to the extra dedup step. But we think it's quite low compared to the rest of operations so shouldn't be too big an issue (e.g. after dedup the ConcurrentHashMap is updated for all records, which can contend with Realtime ingestion and other
addOrReplaceSegment
calls).We only do this for Partial Upsert tables right now since Full Upsert tables don't suffer from the issue mentioned in #12398.
Test Plan: Added a unit-test.