Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor WindowFunction process rows in batched partition #12993

Merged
merged 1 commit into from
May 7, 2024

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Apr 24, 2024

Refactor WindowAggregateOperator from row based processing model to batch processing model.

  1. Partition the data to containers first
  2. For each partition, generate the window agg results.

Motivation: following how presto models the WindowFunction with one WindowFunction interface, then 3 types:

  1. AggregateWindowFunction: this is one function try to handle all the existing aggregation functions
  2. Abstract RankingWindowFunction, this is for all the ranking functions. Here are all the implements: CumulativeDistributionFunction, CustomRank, DenseRankFunction, NTileFunction, PercentRankFunction, RankFunction, RowNumberFunction
  3. Abstract ValueWindowFunction, this is for all the row position based functions. Here are all the implements: FirstValueFunction, LagFunction, LastValueFunction, LeadFunction, NthValueFunction

@xiangfu0 xiangfu0 marked this pull request as draft April 24, 2024 02:33
@xiangfu0 xiangfu0 force-pushed the refactor-window-function-batch branch 4 times, most recently from 45c24a7 to e1a8fec Compare April 24, 2024 03:25
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 0% with 180 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (59551e4) to head (b46e4ed).
Report is 412 commits behind head on master.

Files Patch % Lines
...ator/window/aggregate/AggregateWindowFunction.java 0.00% 46 Missing ⚠️
...uery/runtime/operator/WindowAggregateOperator.java 0.00% 34 Missing ⚠️
...ime/operator/window/range/RangeWindowFunction.java 0.00% 22 Missing ⚠️
...operator/window/range/DenseRankWindowFunction.java 0.00% 14 Missing ⚠️
...runtime/operator/window/WindowFunctionFactory.java 0.00% 13 Missing ⚠️
...time/operator/window/range/RankWindowFunction.java 0.00% 12 Missing ⚠️
.../query/runtime/operator/window/WindowFunction.java 0.00% 11 Missing ⚠️
...ime/operator/window/value/ValueWindowFunction.java 0.00% 10 Missing ⚠️
...operator/window/range/RowNumberWindowFunction.java 0.00% 6 Missing ⚠️
...perator/window/value/FirstValueWindowFunction.java 0.00% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #12993       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.76%     
=============================================
  Files          2436     2439        +3     
  Lines        133233   134144      +911     
  Branches      20636    20769      +133     
=============================================
- Hits          82274        0    -82274     
- Misses        44911   134144    +89233     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 ?
java-21 0.00% <0.00%> (-61.63%) ⬇️
skip-bytebuffers-false 0.00% <0.00%> (-61.75%) ⬇️
skip-bytebuffers-true ?
temurin 0.00% <0.00%> (-61.76%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiangfu0 xiangfu0 marked this pull request as ready for review April 24, 2024 06:31
@Jackie-Jiang Jackie-Jiang added refactor multi-stage Related to the multi-stage query engine labels Apr 24, 2024
@xiangfu0 xiangfu0 force-pushed the refactor-window-function-batch branch 2 times, most recently from 8fb79de to 6ef8918 Compare April 25, 2024 00:41
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more comments in the PR describing about the different types of window functions extracted? What is their difference?

* processRows(List<Object[]> rows) which processes a batch of rows at a time.
*
*/
public abstract class WindowFunction extends AggregationUtils.Accumulator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a WindowFunctionFactory if you want to make it pluggable, maybe following the same way of how TransformFunction is handle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider making this a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added WindowFunctionFactory, will make a separated pr for pluggable stuffs

@xiangfu0
Copy link
Contributor Author

xiangfu0 commented May 2, 2024

Can you add some more comments in the PR describing about the different types of window functions extracted? What is their difference?

Following how presto models the WindowFunction with one WindowFunction interface, then 3 types:

  1. AggregateWindowFunction: this is one function try to handle all the existing aggregation functions
  2. Abstract RankingWindowFunction, this is for all the ranking functions. Here are all the implements: CumulativeDistributionFunction, CustomRank, DenseRankFunction, NTileFunction, PercentRankFunction, RankFunction, RowNumberFunction
  3. Abstract ValueWindowFunction, this is for all the row position based functions. Here are all the implements: FirstValueFunction, LagFunction, LastValueFunction, LeadFunction, NthValueFunction

@xiangfu0 xiangfu0 force-pushed the refactor-window-function-batch branch from 6ef8918 to 0a125ee Compare May 2, 2024 11:34
@xiangfu0 xiangfu0 force-pushed the refactor-window-function-batch branch from 0a125ee to b46e4ed Compare May 7, 2024 08:53
@xiangfu0 xiangfu0 merged commit 70bfd41 into apache:master May 7, 2024
20 checks passed
@xiangfu0 xiangfu0 deleted the refactor-window-function-batch branch May 7, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants