-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
12508: Feature add segment rows flush config #12681
12508: Feature add segment rows flush config #12681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I feel the current way (not introduced in this PR) of setting default value for threshold is very confusing. Let me try to simplify that and you may work on the updated code
@@ -81,6 +81,7 @@ private StreamConfigProperties() { | |||
*/ | |||
public static final String DEPRECATED_SEGMENT_FLUSH_THRESHOLD_ROWS = "realtime.segment.flush.threshold.size"; | |||
public static final String SEGMENT_FLUSH_THRESHOLD_ROWS = "realtime.segment.flush.threshold.rows"; | |||
public static final String SEGMENT_FLUSH_THRESHOLD_SEGMENT_ROWS = "realtime.segment.flush.threshold.segment.rows"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some javadoc explaining the difference between this and SEGMENT_FLUSH_THRESHOLD_ROWS
|
||
if (flushThresholdRows > 0) { | ||
_flushThresholdUpdaterMap.remove(realtimeTableName); | ||
return new DefaultFlushThresholdUpdater(flushThresholdRows); | ||
} else if (flushThresholdSegmentRows > 0) { | ||
return new SegmentRowsBasedFlushThresholdUpdater(flushThresholdSegmentRows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming it to FixedFlushThresholdUpdater
. You also need to call _flushThresholdUpdaterMap.remove(realtimeTableName);
@@ -33,16 +33,20 @@ public class FlushThresholdUpdateManager { | |||
* Check table config for flush size. | |||
* | |||
* If flush size > 0, create a new DefaultFlushThresholdUpdater with given flush size. | |||
* If flush size <= 0, create new SegmentSizeBasedFlushThresholdUpdater if not already created. Create only 1 per | |||
* table because we want to maintain tuning information for the table in the updater. | |||
* If flush size <= 0, create new SegmentRowsBasedFlushThresholdUpdater if flushThresholdSegmentRows > 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to revise the javadoc here. Currently it is not clear.
#12684 cleans up the current threshold handling |
0835511
to
3494dbf
Compare
@Jackie-Jiang Your suggestions were helpful, I updated the docs and renamed the FixedThreshold. let me know if the docs need additional changes. Should I merge #12684 into my branch and update the default threshold calc, or should I wait till it gets merged into master? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12681 +/- ##
============================================
+ Coverage 61.75% 61.99% +0.24%
+ Complexity 207 198 -9
============================================
Files 2436 2462 +26
Lines 133233 134807 +1574
Branches 20636 20829 +193
============================================
+ Hits 82274 83576 +1302
- Misses 44911 45065 +154
- Partials 6048 6166 +118
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
3494dbf
to
5c4e088
Compare
@Jackie-Jiang I updated lines related to thresholdUpdateManager and Its test file |
5c4e088
to
76d86fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good in general. I applied a commit for some detailed handling and extra testing
Adding new config "realtime.segment.flush.threshold.segment.rows" for real-time table to configure the segment threshold independent of max partitions
Behavior:
Legacy behavior:
Issue[12508]:
#12508
When realtime.segment.flush.threshold.rows is used, the flush threshold for the new CONSUMING segment is determined by both this value and the max partitions consumed by any server. This is not very straight forward, and rebalancing a table could cause new consuming segment size to change.
In order to tackle this problem, we may add a new config (proposing realtime.segment.flush.threshold.segment.rows) which works independent of partitions consumed.