-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add skipUnavailableServers
query option
#13387
add skipUnavailableServers
query option
#13387
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13387 +/- ##
============================================
+ Coverage 61.75% 62.00% +0.25%
+ Complexity 207 198 -9
============================================
Files 2436 2550 +114
Lines 133233 140268 +7035
Branches 20636 21803 +1167
============================================
+ Hits 82274 86977 +4703
- Misses 44911 46698 +1787
- Partials 6048 6593 +545
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pinot-core/src/main/java/org/apache/pinot/core/transport/AsyncQueryResponse.java
Outdated
Show resolved
Hide resolved
// Using a different port is a hack to avoid resource conflict with other tests, ideally queryServer.shutdown() | ||
// should ensure there is no possibility of resource conflict. | ||
int port = 12346; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this workaround, but using the same port as other tests impacts test isolation. It looks like queryServer.shutDown()
doesn't properly shutdown and block.
I'm open to suggestions, as I am pretty unfamiliar w/ netty and did not find a clean solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other tests in the same class are sharing the same port, so I guess there are something else impacting this particular test method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that too - the reason why they don't cause test failures is a mix of (1) the order of test execution, and sometimes it is okay to receive the response from another test, and (2) in some tests queryServer.shutDown()
is called earlier in the tests so it has time to shut down.
The behavior should be repeatable when checking out the initial commit in this PR. The flakey test sometimes failed because it receives the response created in the test I added (to be clear, the only flakey test was the one executed right after the test I added). We can also see the responseDelayMs
set in the test I added (500ms) effected the flakey test only when it failed.
A short Thread.sleep(..)
between tests would also have solved the flakiness issue
By default, during query submission to servers, if any exception is encountered the request will be failed. The change here adds a query option that so the server will be skipped and we can still return partial results. This is similar to the CH setting skip_unavailable_shards
For testing, the behavior looks right when blocking the port for one server with:
echo 'block return in quick on lo0 proto tcp from any to any port 7050' | sudo pfctl -f -
When the query option is missing/false the query immediately errors out. When the query option is applied the query displays results from the non-blocked server, and an error saying the blocked server did not respond.
tags:
feature
documentation
release-notes