Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add skipUnavailableServers query option #13387

Merged
merged 3 commits into from
Jun 18, 2024

Conversation

itschrispeck
Copy link
Collaborator

By default, during query submission to servers, if any exception is encountered the request will be failed. The change here adds a query option that so the server will be skipped and we can still return partial results. This is similar to the CH setting skip_unavailable_shards

For testing, the behavior looks right when blocking the port for one server with:
echo 'block return in quick on lo0 proto tcp from any to any port 7050' | sudo pfctl -f -

When the query option is missing/false the query immediately errors out. When the query option is applied the query displays results from the non-blocked server, and an error saying the blocked server did not respond.

tags: feature documentation release-notes

@itschrispeck itschrispeck added feature documentation release-notes Referenced by PRs that need attention when compiling the next release notes labels Jun 13, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.

Project coverage is 62.00%. Comparing base (59551e4) to head (20e7761).
Report is 620 commits behind head on master.

Files Patch % Lines
...a/org/apache/pinot/core/transport/QueryRouter.java 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13387      +/-   ##
============================================
+ Coverage     61.75%   62.00%   +0.25%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2550     +114     
  Lines        133233   140268    +7035     
  Branches      20636    21803    +1167     
============================================
+ Hits          82274    86977    +4703     
- Misses        44911    46698    +1787     
- Partials       6048     6593     +545     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.94% <92.30%> (+0.23%) ⬆️
java-21 61.89% <92.30%> (+0.26%) ⬆️
skip-bytebuffers-false 61.99% <92.30%> (+0.25%) ⬆️
skip-bytebuffers-true 61.87% <92.30%> (+34.14%) ⬆️
temurin 62.00% <92.30%> (+0.25%) ⬆️
unittests 62.00% <92.30%> (+0.25%) ⬆️
unittests1 46.58% <92.30%> (-0.32%) ⬇️
unittests2 27.66% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +299 to +301
// Using a different port is a hack to avoid resource conflict with other tests, ideally queryServer.shutdown()
// should ensure there is no possibility of resource conflict.
int port = 12346;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this workaround, but using the same port as other tests impacts test isolation. It looks like queryServer.shutDown() doesn't properly shutdown and block.

I'm open to suggestions, as I am pretty unfamiliar w/ netty and did not find a clean solution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests in the same class are sharing the same port, so I guess there are something else impacting this particular test method

Copy link
Collaborator Author

@itschrispeck itschrispeck Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that too - the reason why they don't cause test failures is a mix of (1) the order of test execution, and sometimes it is okay to receive the response from another test, and (2) in some tests queryServer.shutDown() is called earlier in the tests so it has time to shut down.

The behavior should be repeatable when checking out the initial commit in this PR. The flakey test sometimes failed because it receives the response created in the test I added (to be clear, the only flakey test was the one executed right after the test I added). We can also see the responseDelayMs set in the test I added (500ms) effected the flakey test only when it failed.

A short Thread.sleep(..) between tests would also have solved the flakiness issue

@chenboat chenboat merged commit 743d6b6 into apache:master Jun 18, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants