Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12508: Feature add segment rows flush config #12681

Conversation

karthi07
Copy link
Contributor

@karthi07 karthi07 commented Mar 20, 2024

Adding new config "realtime.segment.flush.threshold.segment.rows" for real-time table to configure the segment threshold independent of max partitions

Behavior:

  • If flush rows > 0, create a new DefaultFlushThresholdUpdater with the given flush size.
  • If flush segment rows > 0, create a new FixedFlushThresholdUpdater with the given flush size.
  • If flush segment size > 0, create a new SegmentSizeBasedFlushThresholdUpdater if not already created. Create only 1 per table because we want to maintain tuning information for the table in the updater.

Legacy behavior:

  • If flush rows = 0, use segment size based flush threshold.
  • If none of the above are set, create a new DefaultFlushThresholdUpdater.
    • If flush size > 0, create a new DefaultFlushThresholdUpdater with given flush size.

Issue[12508]:
#12508
When realtime.segment.flush.threshold.rows is used, the flush threshold for the new CONSUMING segment is determined by both this value and the max partitions consumed by any server. This is not very straight forward, and rebalancing a table could cause new consuming segment size to change.

In order to tackle this problem, we may add a new config (proposing realtime.segment.flush.threshold.segment.rows) which works independent of partitions consumed.

@Jackie-Jiang Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Mar 21, 2024
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I feel the current way (not introduced in this PR) of setting default value for threshold is very confusing. Let me try to simplify that and you may work on the updated code

@@ -81,6 +81,7 @@ private StreamConfigProperties() {
*/
public static final String DEPRECATED_SEGMENT_FLUSH_THRESHOLD_ROWS = "realtime.segment.flush.threshold.size";
public static final String SEGMENT_FLUSH_THRESHOLD_ROWS = "realtime.segment.flush.threshold.rows";
public static final String SEGMENT_FLUSH_THRESHOLD_SEGMENT_ROWS = "realtime.segment.flush.threshold.segment.rows";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some javadoc explaining the difference between this and SEGMENT_FLUSH_THRESHOLD_ROWS


if (flushThresholdRows > 0) {
_flushThresholdUpdaterMap.remove(realtimeTableName);
return new DefaultFlushThresholdUpdater(flushThresholdRows);
} else if (flushThresholdSegmentRows > 0) {
return new SegmentRowsBasedFlushThresholdUpdater(flushThresholdSegmentRows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming it to FixedFlushThresholdUpdater. You also need to call _flushThresholdUpdaterMap.remove(realtimeTableName);

@@ -33,16 +33,20 @@ public class FlushThresholdUpdateManager {
* Check table config for flush size.
*
* If flush size > 0, create a new DefaultFlushThresholdUpdater with given flush size.
* If flush size <= 0, create new SegmentSizeBasedFlushThresholdUpdater if not already created. Create only 1 per
* table because we want to maintain tuning information for the table in the updater.
* If flush size <= 0, create new SegmentRowsBasedFlushThresholdUpdater if flushThresholdSegmentRows > 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to revise the javadoc here. Currently it is not clear.

@Jackie-Jiang
Copy link
Contributor

#12684 cleans up the current threshold handling

@karthi07 karthi07 force-pushed the feature_add-segment-rows-flush-config_12508 branch from 0835511 to 3494dbf Compare March 22, 2024 10:47
@karthi07
Copy link
Contributor Author

@Jackie-Jiang Your suggestions were helpful, I updated the docs and renamed the FixedThreshold. let me know if the docs need additional changes.

Should I merge #12684 into my branch and update the default threshold calc, or should I wait till it gets merged into master?

@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 61.99%. Comparing base (59551e4) to head (76d86fb).
Report is 210 commits behind head on master.

Files Patch % Lines
...java/org/apache/pinot/spi/stream/StreamConfig.java 66.66% 3 Missing and 1 partial ⚠️
...e/realtime/segment/FixedFlushThresholdUpdater.java 60.00% 2 Missing ⚠️
...he/pinot/segment/local/utils/TableConfigUtils.java 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12681      +/-   ##
============================================
+ Coverage     61.75%   61.99%   +0.24%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2462      +26     
  Lines        133233   134807    +1574     
  Branches      20636    20829     +193     
============================================
+ Hits          82274    83576    +1302     
- Misses        44911    45065     +154     
- Partials       6048     6166     +118     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.93% <75.00%> (+0.22%) ⬆️
java-21 61.86% <75.00%> (+0.23%) ⬆️
skip-bytebuffers-false 61.93% <75.00%> (+0.19%) ⬆️
skip-bytebuffers-true 61.86% <75.00%> (+34.13%) ⬆️
temurin 61.99% <75.00%> (+0.24%) ⬆️
unittests 61.99% <75.00%> (+0.24%) ⬆️
unittests1 46.70% <34.78%> (-0.19%) ⬇️
unittests2 28.00% <50.00%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang
Copy link
Contributor

@karthi07 Just merged #12684. Can you please rebase this PR on top of that?

@karthi07 karthi07 force-pushed the feature_add-segment-rows-flush-config_12508 branch from 3494dbf to 5c4e088 Compare March 27, 2024 10:27
@karthi07
Copy link
Contributor Author

@Jackie-Jiang I updated lines related to thresholdUpdateManager and Its test file
image
I lost a little while merging with #12684, Please review the files and let me the changes required.

@Jackie-Jiang Jackie-Jiang force-pushed the feature_add-segment-rows-flush-config_12508 branch from 5c4e088 to 76d86fb Compare April 4, 2024 23:52
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in general. I applied a commit for some detailed handling and extra testing

@Jackie-Jiang Jackie-Jiang merged commit 3174e91 into apache:master Apr 5, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants