Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve upsert compaction threshold validations #13424

Merged
merged 3 commits into from
Jun 20, 2024

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Jun 18, 2024

Currently, there is no method to schedule an UpsertCompaction task for a segment that has only one invalid record. We have prevented the invalidRecordsThresholdCount from being set to 0 in the tableConfig, although it defaults to 0 in the generator flow. Additionally, during the taskGenerator process, we perform a strict increment check of the invalid threshold count.

} else if (invalidRecordPercent > invalidRecordsThresholdPercent
&& totalInvalidDocs > invalidRecordsThresholdCount) {
segmentsForCompaction.add(Pair.of(segment, totalInvalidDocs));

Therefore, we either need to set a very low decimal value for invalidRecordsThresholdPercent, which may require precision to several decimal places based on the invalid-to-total-docs ratio of a segment, or we can allow the value to be set to 0, enabling users to perform compaction for segments with just one invalid record.

@klsince
Copy link
Contributor

klsince commented Jun 20, 2024

Good catch for the issue. However, I'd change the config invalidRecordsThresholdCount default to 1 and continue to check that it has to be >=1, as it seems more intuitive.

But the the sanity check in generator should do >= instead, i.e. allowing segments with only 1 invalid doc to be compacted.

invalidRecordPercent >= invalidRecordsThresholdPercent 
     && totalInvalidDocs >= invalidRecordsThresholdCount

@tibrewalpratik17
Copy link
Contributor Author

The suggestion sounds good to me! Updated the patch.

&& Double.parseDouble(taskTypeConfig.get("invalidRecordsThresholdPercent")) <= 100,
"invalidRecordsThresholdPercent must be > 0 and <= 100");
"invalidRecordsThresholdPercent must be >= 0 and <= 100");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percent is supposed to be > 0

Copy link
Contributor Author

@tibrewalpratik17 tibrewalpratik17 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The percentage can be set to 0, as the count will ensure the value is always ≥ 1. Therefore, it's safe to keep it at 0, maintaining consistency between the default and user-defined values. What do you think?
Plus it's very non-intuitive to set something like 0.000001 to make it work with large segments with less number of invalid doc ids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. good point about the inconvenience to set values like 0.000001 to make the sanity check on percentage to pass

@klsince klsince merged commit 6883a4d into apache:master Jun 20, 2024
20 checks passed
suyashpatel98 pushed a commit to suyashpatel98/pinot that referenced this pull request Jul 6, 2024
* Improve upsert compaction threshold validations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants