Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[null-aggr] Add null handling support in mode aggregation #12227

Merged
merged 35 commits into from
Mar 4, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Jan 5, 2024

This PR adds null handling support in the mode aggregation. Specifically, it modifies the code to ignore null values when mode is evaluated with nullHandlingEnabled, trying to follow the following postgres logic:

postgres=# create table myTable(myInt INT);
CREATE TABLE
postgres=# insert into myTable(myInt) VALUES (null), (1), (null);
INSERT 0 3
postgres=# SELECT mode() WITHIN GROUP (ORDER BY myInt) AS mode FROM myTable;
 mode 
------
    1
(1 row)

It is recommended to review this PR assuming PR #12226 is merged, which is equivalent to compare this PR from commit commit f966b1b295134a08a3b201dadcedd91ca7d01c0a to head. As you can see there, this PR also changes the test and shows that when null handling is enabled, nulls are ignored.

@codecov-commenter
Copy link

codecov-commenter commented Jan 5, 2024

Codecov Report

Attention: Patch coverage is 76.66667% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 61.73%. Comparing base (59551e4) to head (c8da30c).
Report is 30 commits behind head on master.

Files Patch % Lines
...nction/NullableSingleInputAggregationFunction.java 56.75% 15 Missing and 1 partial ⚠️
...n/function/BaseSingleInputAggregationFunction.java 0.00% 10 Missing ⚠️
.../aggregation/function/ModeAggregationFunction.java 97.22% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12227      +/-   ##
============================================
- Coverage     61.75%   61.73%   -0.03%     
  Complexity      207      207              
============================================
  Files          2436     2451      +15     
  Lines        133233   133592     +359     
  Branches      20636    20684      +48     
============================================
+ Hits          82274    82467     +193     
- Misses        44911    45044     +133     
- Partials       6048     6081      +33     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.66% <76.66%> (-0.05%) ⬇️
java-21 61.61% <76.66%> (-0.01%) ⬇️
skip-bytebuffers-false 61.71% <76.66%> (-0.04%) ⬇️
skip-bytebuffers-true 61.58% <76.66%> (+33.85%) ⬆️
temurin 61.73% <76.66%> (-0.03%) ⬇️
unittests 61.72% <76.66%> (-0.03%) ⬇️
unittests1 46.89% <76.66%> (+<0.01%) ⬆️
unittests2 27.68% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gortiz
Copy link
Contributor Author

gortiz commented Feb 28, 2024

@Jackie-Jiang this should be ready to merge. Can you take a look? As far as I can see the only discussion pending is whether the NullableSingleInputAggregationFunction should be merged with BaseSingleInputAggregationFunction

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good. I saw you added mode into the AllNullQueriesTest, but somehow I didn't find where the null intermediate result is handled

@@ -467,7 +501,11 @@ public ColumnDataType getFinalResultColumnType() {
@Override
public Double extractFinalResult(Map<? extends Number, Long> intermediateResult) {
if (intermediateResult.isEmpty()) {
return DEFAULT_FINAL_RESULT;
if (_nullHandlingEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When _nullHandlingEnabled is true, and all input values are null, will this map be null?
Same for merge()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this map = intermediateResult?

In that case, no. intermediateResult is created in extractAggregationResult(AggregationResultHolder), which calls extractIntermediateResult which returns not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a new test to show the result when all stored data is null

@gortiz
Copy link
Contributor Author

gortiz commented Feb 29, 2024

Mostly good. I saw you added mode into the AllNullQueriesTest, but somehow I didn't find where the null intermediate result is handled

The change I made there was to create one test per query instead of executing all in the same test method. The only reason to apply the change is to improve the UX in the case there is at least one failure. Before only the first error was reported and once you fix the issue there you had to run the tests again to check if there were other failures. Now all failures are reported at the same time.

I've modified the code long ago, so I don't remember the exact changes, but reading the diff it doesn't look I've changed anything related to null intermediate result.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants