Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some missing geospatial scalar functions to support use in intermediate stages with the v2 query engine #13457

Merged
merged 1 commit into from
Jun 22, 2024

Conversation

yashmayya
Copy link
Collaborator

  • Currently, a query like Select ST_Contains(ST_GeomFromText('MULTIPOINT (20 20, 25 25)'), ST_GeomFromText('POINT (25 25)')) from GeoSpatialTest LIMIT 5; will work on both the v1 and v2 query engine.
  • However, a query like Select ST_Contains(ST_GeomFromText('MULTIPOINT (20 20, 25 25)'), ST_GeomFromText('POINT (25 25)')) from GeoSpatialTest a CROSS JOIN GeoSpatialTest b LIMIT 5 results in the following error: IllegalStateException: Cannot find function with name: ST_CONTAINS.
  • The reason is that the function is executed in the leaf stage in the first query, and is able to make use of the ST_Contains transform function. In the second query, however, the function is executed in an intermediate stage where transform functions aren't currently supported (scalar functions are supported though).
  • Explain plans for the queries -
    • First:
       LogicalSort(offset=[0], fetch=[5])
         PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
           LogicalSort(fetch=[5])
             LogicalProject(EXPR$0=[1])
               LogicalTableScan(table=[[default, GeoSpatialTest]])
      
    • Second:
       LogicalSort(offset=[0], fetch=[5])
         PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
           LogicalSort(fetch=[5])
             LogicalProject(EXPR$0=[1])
               LogicalJoin(condition=[true], joinType=[inner])
                 PinotLogicalExchange(distribution=[random])
                   LogicalProject(DUMMY=[0])
                     LogicalTableScan(table=[[default, GeoSpatialTest]])
                 PinotLogicalExchange(distribution=[broadcast])
                   LogicalProject(DUMMY=[0])
                     LogicalTableScan(table=[[default, GeoSpatialTest]])
      
  • Note that the queries are only representative literal selection queries to demonstrate the issue which can occur in numerous other real queries on the v2 query engine. Also note that the second query type could be optimized in the future to not execute the function in an intermediate stage.
  • Most of the geospatial transform functions are also implemented as scalar functions here and so this issue is limited to only some functions. This patch adds some of the simpler missing functions as a short term fix. However, the long term fix would be to allow the use of transform functions in intermediate stages as well and avoid these sorts of duplicate implementations.

if (GeometryUtils.isGeography(firstGeometry) != GeometryUtils.isGeography(secondGeometry)) {
throw new RuntimeException("The first and second arguments should either both be geometry or both be geography");
}
// TODO: to fully support Geography contains operation.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relation between the TODO and the linked GH comment? Anyway, if you think it is important enough to add it here as a comment in the PR, I guess you should also add something likeSee #8620 in the comment in the code

Copy link
Collaborator Author

@yashmayya yashmayya Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same TODO was added to the corresponding transform functions in #8620 and the explanation is in the linked GH comment. Although the limitation is present in the public documentation as well so I can remove the TODO comment here if you'd like (since the original PR is over 2 years old at this point so not sure if this is important enough to ever be implemented).

@yashmayya yashmayya added multi-stage Related to the multi-stage query engine geo labels Jun 21, 2024
@Jackie-Jiang Jackie-Jiang merged commit 8df722a into apache:master Jun 22, 2024
20 checks passed
suyashpatel98 pushed a commit to suyashpatel98/pinot that referenced this pull request Jul 6, 2024
yashmayya added a commit to yashmayya/pinot that referenced this pull request Jul 24, 2024
…ediate stages with the v2 query engine (apache#13457)

(cherry picked from commit 8df722a)
@npawar npawar added the v1v2 label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
geo multi-stage Related to the multi-stage query engine v1v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants