-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using 'serverReturnFinalResult' to optimize server partitioned table #13208
Allow using 'serverReturnFinalResult' to optimize server partitioned table #13208
Conversation
40e85fd
to
69c07ba
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13208 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2534 +98
Lines 133233 139008 +5775
Branches 20636 21535 +899
============================================
+ Hits 82274 86420 +4146
- Misses 44911 46126 +1215
- Partials 6048 6462 +414
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
af4159f
to
0eefa89
Compare
public ColumnDataType getFinalResultColumnType() { | ||
return ColumnDataType.LONG_ARRAY; | ||
public LongArrayList mergeFinalResult(LongArrayList finalResult1, LongArrayList finalResult2) { | ||
long[] result = new long[_numSteps]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not directly add finalResult2
to finalResult1
then return finalResult1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Overall LGTM. |
@@ -38,6 +38,7 @@ | |||
@SuppressWarnings({"rawtypes", "unchecked"}) | |||
public abstract class IndexedTable extends BaseTable { | |||
protected final Map<Key, Record> _lookupMap; | |||
protected final boolean _hasFinalInput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this IndexedTable has multiple aggregations, do you need this _hasFinalInput
also to be per agg function or it's just an indicate ?
E.g. SELECT distinctCount(pk), avg(other_column) from myTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V1 engine we don't have a way to make per function hint. We can make it more smart in V2 which can carry more information.
0eefa89
to
ce47def
Compare
ce47def
to
78fee19
Compare
When a column is partitioned on each server (i.e. the same value always show up on the same server), the following queries can be optimized by asking server to directly return final aggregate result instead of intermediate aggregate result.
1:
SELECT DISTINCT_COUNT(partitionedCol) FROM myTable
2:
SELECT DISTINCT_COUNT(partitionedCol) FROM myTable GROUP BY col
3:
SELECT AGG(col) FROM myTable GROUP BY partitionedCol
For all 3 queries, we can ask server to return final aggregate result, but there are some difference between 2 and 3. For 2, server can return final aggregate result, but should still keep enough groups because the aggregate result is not global final result, but only the final result for a partition; For 3, server only needs to keep
LIMIT
groups because the aggregate result is global final result for the group.In this PR, user can
SET serverReturnFinalResult = true;
to accelerate 1 and 3; user canSET serverReturnFinalResultKeyUnpartitioned = true;
to accelerate 2This PR also fixes the following bugs:
GroupByDataTableReducer
MinMaxRangeAggregationFunction