-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve upsert compaction threshold validations #13424
Improve upsert compaction threshold validations #13424
Conversation
Good catch for the issue. However, I'd change the config invalidRecordsThresholdCount default to 1 and continue to check that it has to be >=1, as it seems more intuitive. But the the sanity check in generator should do >= instead, i.e. allowing segments with only 1 invalid doc to be compacted.
|
The suggestion sounds good to me! Updated the patch. |
&& Double.parseDouble(taskTypeConfig.get("invalidRecordsThresholdPercent")) <= 100, | ||
"invalidRecordsThresholdPercent must be > 0 and <= 100"); | ||
"invalidRecordsThresholdPercent must be >= 0 and <= 100"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
percent is supposed to be > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The percentage can be set to 0, as the count will ensure the value is always ≥ 1. Therefore, it's safe to keep it at 0, maintaining consistency between the default and user-defined values. What do you think?
Plus it's very non-intuitive to set something like 0.000001 to make it work with large segments with less number of invalid doc ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. good point about the inconvenience to set values like 0.000001
to make the sanity check on percentage to pass
...t-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java
Outdated
Show resolved
Hide resolved
* Improve upsert compaction threshold validations
Currently, there is no method to schedule an UpsertCompaction task for a segment that has only one invalid record. We have prevented the invalidRecordsThresholdCount from being set to 0 in the tableConfig, although it defaults to 0 in the generator flow. Additionally, during the taskGenerator process, we perform a strict increment check of the invalid threshold count.
pinot/pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java
Lines 242 to 244 in 99f6934
Therefore, we either need to set a very low decimal value for invalidRecordsThresholdPercent, which may require precision to several decimal places based on the invalid-to-total-docs ratio of a segment, or we can allow the value to be set to 0, enabling users to perform compaction for segments with just one invalid record.