Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nested exclusive predicates with JSON index; fix semantics for exclusive predicates #13139

Merged

Conversation

yashmayya
Copy link
Collaborator

@yashmayya yashmayya commented May 13, 2024

  • The JSON index currently doesn't support nested (within AND / OR) exclusive predicates (NOT IN, !=, IS NULL) with the JSON_MATCH filter operator.
  • The reason for this is the existing semantics for exclusive predicates that are inconsistent and confusing.
  • A document like:
    •  [
         {
           "key1": "value1",
           "key2": 1
         },
         {
           "key1": "value2",
           "key2": 2
         }
       ]
      
      matches "$[*].key2" = 1, but not "$[*].key2" != 2 for example.
  • Essentially, the wildcard array access for inclusive predicates means that we match the array when ANY value matches, but for exclusive predicates, we match the array when ALL values match.
  • Furthermore, a document like:
    •  [
         {
            "key1": "value1",
         }
       ]
      
      matches "$[0].key2" != 2 which is counterintuitive since the object doesn’t contain key2 at all.
  • The reason for both of these inconsistencies and confusing semantics is the way that exclusive predicates are implemented. Exclusive predicates are currently handled by finding the flattened doc IDs of the corresponding inclusive predicate, mapping them back to the corresponding unflattened doc IDs, and then inverting / flipping the resultant bitmap. And that is also the reason why nested exclusive predicates aren’t supported currently since the intermediate flattened doc ID results can’t be combined with flattened doc ID results from other child predicates while maintaining the same semantics.
  • This patch fixes the above inconsistencies so that [*] now means the same for both exclusive (NOT IN, !=) and inclusive predicates, and documents where a key doesn't exist aren't matched for the exclusive predicates NOT IN, !=. It also adds support for nested exclusive predicates.
  • The new implementation for NOT IN, != make use of similar logic to the implementation for the regex and range predicates (Enhance json index to support regexp and range predicate evaluation #12568).
  • Essentially, we first get all the values for a corresponding key. Then, we iterate through the values - and for each value that matches the predicate, we add the list of corresponding flattened doc IDs to the result.
  • The IS NULL predicate continues to remain an "exclusive" predicate and won't be supported for use in nested predicates as of now, because there isn't a straightforward way to efficiently treat it as an inclusive predicate (i.e., without scanning all the flattened docs). We can't simply flip the flattened doc ID result for the corresponding IS NOT NULL predicate either as that leads to incorrect results when disableCrossArrayUnnest is set to true, as well as certain other scenarios involving chained array accesses and wildcard accesses.
  • Suggested labels: feature, backward-incompat, release-notes

@codecov-commenter
Copy link

codecov-commenter commented May 13, 2024

Codecov Report

Attention: Patch coverage is 79.81221% with 43 lines in your changes are missing coverage. Please review.

Project coverage is 62.17%. Comparing base (59551e4) to head (c6c54a6).
Report is 452 commits behind head on master.

Files Patch % Lines
...local/realtime/impl/json/MutableJsonIndexImpl.java 77.06% 14 Missing and 11 partials ⚠️
...t/index/readers/json/ImmutableJsonIndexReader.java 82.69% 10 Missing and 8 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13139      +/-   ##
============================================
+ Coverage     61.75%   62.17%   +0.42%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137927    +4694     
  Branches      20636    21347     +711     
============================================
+ Hits          82274    85753    +3479     
- Misses        44911    45789     +878     
- Partials       6048     6385     +337     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.11% <79.81%> (+0.40%) ⬆️
java-21 62.05% <79.81%> (+0.43%) ⬆️
skip-bytebuffers-false 62.16% <79.81%> (+0.41%) ⬆️
skip-bytebuffers-true 62.01% <79.81%> (+34.29%) ⬆️
temurin 62.17% <79.81%> (+0.42%) ⬆️
unittests 62.16% <79.81%> (+0.42%) ⬆️
unittests1 46.73% <0.00%> (-0.16%) ⬇️
unittests2 27.79% <79.81%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya force-pushed the json-match-nested-exclusive-predicate branch from 031f9b1 to fc6493f Compare May 13, 2024 09:01
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes backward-incompat Referenced by PRs that introduce or fix backward compat issues enhancement labels May 13, 2024
@yashmayya yashmayya marked this pull request as ready for review May 15, 2024 05:32
@yashmayya yashmayya force-pushed the json-match-nested-exclusive-predicate branch from 35d3e6e to c6c54a6 Compare May 16, 2024 05:54
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

@Jackie-Jiang Jackie-Jiang merged commit 2015d6e into apache:master May 17, 2024
20 checks passed
gortiz pushed a commit to gortiz/pinot that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues documentation enhancement feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants