Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition in ScalingThreadPoolExecutor #13360

Merged

Conversation

itschrispeck
Copy link
Collaborator

@itschrispeck itschrispeck commented Jun 11, 2024

I introduced ScalingThreadPoolExecutor previously to provide an autoscaling thread pool, used to prevent interrupts from corrupting the realtime Lucene index.

There is a race condition as the logic relied on _executor.getPoolSize() and _executor.getActiveCount(), and using the latter could lag the 'real' count of currently idle threads. In this case, the task would be queued and not executed. This PR changes the implementation slightly to track idle threads via overriding the two methods that may be used by ThreadPoolExecutor.getTask(), which is always executed by an idle thread to pick up the next task.

In theory the bug could cause sporadic timeouts for searches against the realtime Lucene index, though it is hard to reproduce. I came across this bug trying to use ScalingThreadPoolExecutor for another feature.

For testing, unit tests should cover this logic. The race condition is easily reproducible when the added unit test is used against the old implementation.

suggested tag: bugfix

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 62.11%. Comparing base (59551e4) to head (47cd276).
Report is 608 commits behind head on master.

Files Patch % Lines
.../pinot/common/utils/ScalingThreadPoolExecutor.java 50.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13360      +/-   ##
============================================
+ Coverage     61.75%   62.11%   +0.36%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2548     +112     
  Lines        133233   139957    +6724     
  Branches      20636    21729    +1093     
============================================
+ Hits          82274    86938    +4664     
- Misses        44911    46432    +1521     
- Partials       6048     6587     +539     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.06% <50.00%> (+0.36%) ⬆️
java-21 61.99% <50.00%> (+0.37%) ⬆️
skip-bytebuffers-false 62.10% <50.00%> (+0.35%) ⬆️
skip-bytebuffers-true 61.95% <50.00%> (+34.22%) ⬆️
temurin 62.11% <50.00%> (+0.36%) ⬆️
unittests 62.11% <50.00%> (+0.36%) ⬆️
unittests1 46.70% <50.00%> (-0.20%) ⬇️
unittests2 27.71% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang merged commit 36ce140 into apache:master Jun 11, 2024
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants