Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize V2 Plan using Protobufs instead of reflection. #13221

Merged
merged 26 commits into from
May 30, 2024

Conversation

vrajat
Copy link
Collaborator

@vrajat vrajat commented May 24, 2024

Plans are serialized using reflection and are identified by class names. Serialization using reflection is brittle to changes across versions e.g. class names may change.
Change V2 plan serialization to use protobufs. Using protobufs automatically provides backward and forward compatibility of changes made to PlanNodes.

Tags:

  • backward-incompat

@codecov-commenter
Copy link

codecov-commenter commented May 25, 2024

Codecov Report

Attention: Patch coverage is 71.27273% with 158 lines in your changes are missing coverage. Please review.

Project coverage is 62.15%. Comparing base (59551e4) to head (93b6c88).
Report is 511 commits behind head on master.

Files Patch % Lines
.../pinot/query/planner/serde/PlanNodeSerializer.java 71.77% 37 Missing and 9 partials ⚠️
...not/query/planner/serde/StageNodeDeserializer.java 74.44% 36 Missing and 10 partials ⚠️
.../planner/serde/ProtoExpressionToRexExpression.java 50.87% 25 Missing and 3 partials ⚠️
.../planner/serde/RexExpressionToProtoExpression.java 58.06% 21 Missing and 5 partials ⚠️
...che/pinot/query/planner/plannode/ExchangeNode.java 0.00% 10 Missing ⚠️
.../query/planner/logical/RelToPlanNodeConverter.java 80.00% 0 Missing and 1 partial ⚠️
...not/query/planner/plannode/MailboxReceiveNode.java 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13221      +/-   ##
============================================
+ Coverage     61.75%   62.15%   +0.40%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2536     +100     
  Lines        133233   139390    +6157     
  Branches      20636    21545     +909     
============================================
+ Hits          82274    86644    +4370     
- Misses        44911    46268    +1357     
- Partials       6048     6478     +430     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.10% <71.27%> (+0.40%) ⬆️
java-21 62.03% <71.27%> (+0.41%) ⬆️
skip-bytebuffers-false 62.14% <71.27%> (+0.39%) ⬆️
skip-bytebuffers-true 62.01% <71.27%> (+34.28%) ⬆️
temurin 62.15% <71.27%> (+0.40%) ⬆️
unittests 62.15% <71.27%> (+0.40%) ⬆️
unittests1 46.73% <71.27%> (-0.16%) ⬇️
unittests2 27.74% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gortiz gortiz added incompatible Indicate PR that introduces backward incompatibility backward-incompat Referenced by PRs that introduce or fix backward compat issues multi-stage Related to the multi-stage query engine labels May 28, 2024
Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of suggestions, but they are style based and can be done in a following PR

Comment on lines 93 to 99
Plan.JoinNode.Builder joinNodeBuilder =
Plan.JoinNode.newBuilder().setJoinRelType(convertJoinRelType(node.getJoinRelType())).setJoinKeys(joinKeyBuilder)
.setJoinClause(convertExpressions(node.getJoinClauses()))
.setJoinHints(getNodeHintBuilder(node.getJoinHints())).addAllLeftColumnNames(node.getLeftColumnNames())
.addAllRightColumnNames(node.getRightColumnNames());

builder.setJoinNode(joinNodeBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is completely subjective, but I find very difficult to read this kind of calls. Don't you find it easier to read when there is one single fluent method call per line? Something like:

Plan.JoinNode.Builder joinNodeBuilder = Plan.JoinNode.newBuilder()
        .setJoinRelType(convertJoinRelType(node.getJoinRelType()))
        .setJoinKeys(joinKeyBuilder)
        .setJoinClause(convertExpressions(node.getJoinClauses()))
        .setJoinHints(getNodeHintBuilder(node.getJoinHints()))
        .addAllLeftColumnNames(node.getLeftColumnNames())
        .addAllRightColumnNames(node.getRightColumnNames());

We don't have any code style rule that force use to use one or the other mode, so it is just a nit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not write the code this way. This is formatted by intellij using the project's coding rules.

I ran an experiment where I modified the code so that there is one fn. per line as you have pasted above. Then I formatted the file (Code -> Reformat File ...). The editor changed the format back to fitting in as many function calls as possible into a line. It is too much of an hassle to disallow a developer to reformat this file. The best option is to add this rule in the future and then make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intellij convention is not using the project code styles. Or being more precise, the project code style does not define this situation, so intellij is free to do whatever it does by default.

TBH this is not even defined in Google style guide, which basically says to do whatever it considered better. The fact that Intellij just tries to add as many methods as possible in the same line is a cheap decision by their side. I would love that by default intellij would have decided to do not modify these lines given it there is no clear winner in 100% of the cases.

So my suggestion is to disable that option in intellij code style, but again, this is not a strong policy we have in the project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link says that it does use the coding style setup in the config: https://www.jetbrains.com/help/idea/reformat-and-rearrange-code.html

Previously I was told to setup the Pinot code style and reformat the file as the code did not meet some of the standards.

Comment on lines 55 to 78
case TABLESCANNODE:
return visitTableScanNode(protoNode);
case RECEIVENODE:
return visitMailboxReceiveNode(protoNode);
case SENDNODE:
return visitMailboxSendNode(protoNode);
case SETNODE:
return visitSetNode(protoNode);
case EXCHANGENODE:
return visitExchangeNode(protoNode);
case SORTNODE:
return visitSortNode(protoNode);
case WINDOWNODE:
return visitWindowNode(protoNode);
case VALUENODE:
return visitValueNode(protoNode);
case PROJECTNODE:
return visitProjectNode(protoNode);
case FILTERNODE:
return visitFilterNode(protoNode);
case AGGREGATENODE:
return visitAggregateNode(protoNode);
case JOINNODE:
return visitJoinNode(protoNode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really seem to be implementing the visitor pattern so the naming feels a little odd here. Maybe processTableScanNode or deserializeTableScanNode might be more appropriate than visitTableScanNode (and likewise for the other node types), WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I complained about visitor naming before, but it was on the Java class, which I consider more problematic.
I think naming a class whateverVisitor implies to use the visitor pattern, but I don't think the same can be said for methods with visit prefix. Specially in this case where methods are private.

I mean, I'm fine if we change the name, but I don't think it is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the names.

@gortiz gortiz merged commit 3b57116 into apache:master May 30, 2024
18 of 19 checks passed
@vrajat vrajat deleted the rv_plan_proto branch May 30, 2024 11:45
@Jackie-Jiang
Copy link
Contributor

Left some comments here

gortiz pushed a commit to gortiz/pinot that referenced this pull request Jun 14, 2024
* Prototype with TableScanNode serialized using a protobuf message.

* Serialize Mailbox Send and Receive.

* Serialize SetOpNode.

* Serialize Exchange and Sort

* Add serialization support for all nodes

* Compiles

* Add license header

* Allocate a hashmap within nodehints.

* Fix trailing whitespace

* Handle nulls correctly in direction keys and Literal Expressions

* Create new context for every plan node.

* Improve structure of visitors.

* Fix long line.

* Only support specific types of literals.

* Do not assume data type and underlying Java type matches.

* Support object literal.

* Add javadocs to a couple of files.

* Set distinct flag

* Address review comments.

* Undo style change.

* Remove StageNodeSerDeUtils.java

* Do not prematurely serialize Stagenode.

* Revert "Do not prematurely serialize Stagenode."

This reverts commit 3969f63.

* Make visitor a nested class.

* Remove use of visit prefix.

* Fix name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues incompatible Indicate PR that introduces backward incompatibility multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants