-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract json individual array elements from json index for the transform function jsonExtractIndex #12466
Conversation
@@ -209,41 +210,11 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { | |||
} else { | |||
key = JsonUtils.KEY_SEPARATOR + key; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Move the processing of array index to the static function processArrayIndex for code reusing
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12466 +/- ##
============================================
+ Coverage 61.75% 61.77% +0.02%
+ Complexity 207 198 -9
============================================
Files 2436 2450 +14
Lines 133233 133781 +548
Branches 20636 20753 +117
============================================
+ Hits 82274 82643 +369
- Misses 44911 45034 +123
- Partials 6048 6104 +56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
key = key.substring(1); | ||
} else { | ||
key = JsonUtils.KEY_SEPARATOR + key; | ||
} | ||
|
||
// Process the array index within the key if exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Keep the key preprocessing but move the generation of matchingDocIds over array index json path to the static function processArrayIndex.
@itschrispeck Can you please help take a look? |
...ava/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunctionTest.java
Outdated
Show resolved
Hide resolved
for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) { | ||
// get docIds from posting list, convert these to the actual docIds | ||
ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId); | ||
PeekableIntIterator it = flattenedDocIds.getIntIterator(); | ||
PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator() | ||
: intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoaringBitmap.and(arrayIndexFlattenDocIds, flattenedDocIds).getIntIterator()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, RoaringBitmap.and works for MutableJsonIndex. But doesn't work for ImmutableJsonIndexReader;
The RoaringBitmap.and() require the input type is RoaringBitmap;
MutableRoaringBitmap (arrayIndexFlattenDocIds) and ImmutableRoaringBitamp (flattenedDocIds) are not the subclass of RoaringBitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about MutableRoaringBitmap.and(arrayIndexFlattenDocIds, flattenedDocIds)
?
...al/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java
Outdated
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java
Outdated
Show resolved
Hide resolved
...java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java
Outdated
Show resolved
Hide resolved
bd8d2ce
to
d9d319c
Compare
The integration test fails on a mutlistage query irrelevant to json. the temeurin-11 succeeds. |
e15c1c3
to
12ddc72
Compare
Take an example, the json records is shown as follows:
Given a json path
$.foo[0].bar[1]
, we want to get extract all values under this json path and associated docId (not the flattened docID) as follows:{"y":[0], "z":[1]}
.When setDisableCrossArrayUnnest(true), the conceptual flattened doc model is listed below
The data structure of json indexing below
Solution
Step 1: Get the flattenDocIds associated to the given json path. Basically, we will reuse the existing code to process json path containing array index. Use the example above, the array index keys will be
.foo.$index</u0000>0
and.foo..bar.$index</u0000>1
, bitmap = the conjunction of {0,1,4,5} and {1,3,5} = {1,5}Step 2: Traverse the inverted indexing dictionary, find all keys with the prefix
.foo..bar.</u0000>
. If the associated bitmap intersecting with the bitmap coming from step 1 is not empty, then parse the value from the key and add the intersection result into the result map.Step 3: transfer the flattened docId into the original docId by looking up the dodIdMapping
{"y":1, "z": 5} => {"y":0, "z": 1}