Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: Fix merge result from more than one server #12778

Merged
merged 3 commits into from
Apr 3, 2024

Conversation

davecromberge
Copy link
Member

Summary

The merge operation results in a null reference error when the set operations is not set properly for Tuple Sketches.

Description

When more than one server returns an intermediate result to the broker, the custom object accumulators need to perform a merge. When some of the state on the object accumulator is not re-initialised, this can lead to unpredictable behaviour.

The serialisation on these custom objects (for sketches) is somewhat cumbersome. Ideally, we would encode the fields in binary together with the underlying sketch payload. I came across a test case where sketches were fetched directly from the servers and I'm not sure as to whether there are users in the community who expect results to conform directly to a sketch for these server responses.

Finally, I was not able to find a good way to test multiple server responses with an aggregation function - please let me know if there is a good example.

Tags

bugfix .

The merge operation results in a null reference error when
the set operations is not set properly for Tuple Sketches.
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to set threshold?

The root cause of this is that the intermediate result is not fully serialized

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 28.57143% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 62.00%. Comparing base (59551e4) to head (f6702ea).
Report is 198 commits behind head on master.

Files Patch % Lines
...unction/IntegerTupleSketchAggregationFunction.java 0.00% 3 Missing ⚠️
...ion/DistinctCountCPCSketchAggregationFunction.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12778      +/-   ##
============================================
+ Coverage     61.75%   62.00%   +0.25%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2461      +25     
  Lines        133233   134729    +1496     
  Branches      20636    20818     +182     
============================================
+ Hits          82274    83535    +1261     
- Misses        44911    45040     +129     
- Partials       6048     6154     +106     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.97% <28.57%> (+0.26%) ⬆️
java-21 61.84% <28.57%> (+0.22%) ⬆️
skip-bytebuffers-false 61.98% <28.57%> (+0.23%) ⬆️
skip-bytebuffers-true 61.83% <28.57%> (+34.10%) ⬆️
temurin 62.00% <28.57%> (+0.25%) ⬆️
unittests 61.99% <28.57%> (+0.25%) ⬆️
unittests1 46.70% <28.57%> (-0.19%) ⬇️
unittests2 27.99% <0.00%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davecromberge
Copy link
Member Author

davecromberge commented Apr 2, 2024

Do you also need to set threshold?

Good catch - you are correct. This has now been included.

The root cause of this is that the intermediate result is not fully serialized

If by this you mean that it is falling back on the sketch serialiser - yes. The reason for this is described above. I found a use case in the tests where the results from servers are inspected directly / programmatically and I'm unsure whether this represents a real world use case. If we can discard this use case we can then consider how best to provide a backward compatible serialiser for the current (Theta) implementation and fully serialise the intermediate representations.

What do you think?

@Jackie-Jiang
Copy link
Contributor

What I meant is that the ser/de doesn't carry all information from intermediate result, thus we need to set the extra parameters after it being deserialized. I think it is okay for now.

@Jackie-Jiang Jackie-Jiang merged commit 721e655 into apache:master Apr 3, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants