Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a post-validator visitor that verifies there are no cast to bytes #12475

Merged
merged 4 commits into from
Mar 9, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Feb 22, 2024

This PR adds a check that fails if there is an implicit or explicit cast from to Bytes.

I still need to add some automatic tests, but I've tested manually. In a query like:

select * from starbucksStores where location_st_point = 
'80c062bc98021f94f1404e9bda0f6b0202'
limit 10

The message shown is: From line 0, column 0 to line 2, column 36: Cannot cast '80c062bc98021f94f1404e9bda0f6b0202' as VARBINARY. Try to use binary literal instead (like x'80c062bc98021f94f1404e9bda0f6b0202')

In a query like:

select * from starbucksStores 
where OCTET_LENGTH(
  substring('80c062bc98021f94f1404e9bda0f6b0202', 2)
) > 0 limit 10

The message shown is: From line 0, column 0 to line 3, column 52: Cannot cast SUBSTRING('80c062bc98021f94f1404e9bda0f6b0202' FROM 2) as VARBINARY. Try to wrap the expression in hexToBytes (like hexToBytes(SUBSTRING('80c062bc98021f94f1404e9bda0f6b0202' FROM 2)))

Two things that we can try to improve (although priority doesn't seem high):

  • Given in this cases the cast expression was implicit, start position seem to be line 0 column 0. It would be great to find a way to show a more precise start position
  • As shown in the second case, the suggested expression is not exactly like the original one.

Related to #12457

@gortiz gortiz added the multi-stage Related to the multi-stage query engine label Feb 22, 2024
@gortiz gortiz marked this pull request as ready for review February 28, 2024 09:40
@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 0% with 27 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (59551e4) to head (0dde4a0).
Report is 34 commits behind head on master.

Files Patch % Lines
.../apache/pinot/query/validate/BytesCastVisitor.java 0.00% 24 Missing ⚠️
...che/pinot/query/validate/InvalidCastException.java 0.00% 2 Missing ⚠️
.../java/org/apache/pinot/query/QueryEnvironment.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #12475       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.76%     
=============================================
  Files          2436     2376       -60     
  Lines        133233   129827     -3406     
  Branches      20636    20138      -498     
=============================================
- Hits          82274        0    -82274     
- Misses        44911   129827    +84916     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 ?
java-21 0.00% <0.00%> (-61.63%) ⬇️
skip-bytebuffers-false ?
skip-bytebuffers-true 0.00% <0.00%> (-27.73%) ⬇️
temurin 0.00% <0.00%> (-61.76%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good

_originalValidator = originalValidator;
}

@SuppressWarnings("checkstyle:WhitespaceAround")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) which warning do we violate here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Should we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Preconditions.checkState(sqlNode instanceof SqlDataTypeSpec);
RelDataType toType = ((SqlDataTypeSpec) sqlNode).deriveType(_originalValidator);
if (!SqlTypeUtil.isBinary(toType)) {
return super.visit(call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other short circuit call

Suggested change
return super.visit(call);
return null;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the same, right?

return null just aborts the execution here while return super.visit(call) does in fact return the same value, but has the desired side effect of visiting the operators.

For example, an expression like: CAST (someOperation(CAST ('someliteral' as BINARY)) as STRING) will be converted into:

  1. CAST
    1. someOperation
      1. CAST
        1. 'someliteral'
        2. BINARY
    2. STRING

In case we changed this line to return null we won't be able to detect the inner cast to BINARY because we would just abort the visitor once the first CAST to STRING was detected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, make sense!

}
SqlNode srcNode = operands.get(0);
RelDataType fromType = _originalValidator.getValidatedNodeTypeIfKnown(srcNode);
if (fromType != null && SqlTypeUtil.isBinary(fromType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we every cast from binary to binary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Calcite there is BINARY and VARBINARY. In both cases SqlTypeUtil.isBinary return true. I'm assuming we want to allow castings between them.

message += " Try to wrap the expression in hexToBytes (like hexToBytes(" + srcNode + "))";
}
SqlParserPos pos = call.getParserPosition();
RuntimeException ex = new RuntimeException(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw a more specific exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to throw a new runtime exception called InvalidCastException

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

_originalValidator = originalValidator;
}

@SuppressWarnings("checkstyle:WhitespaceAround")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Should we remove this?

@Jackie-Jiang Jackie-Jiang merged commit b2689bc into apache:master Mar 9, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants