-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add instrumentation to json index getMatchingFlattenedDocsMap() #13164
add instrumentation to json index getMatchingFlattenedDocsMap() #13164
Conversation
…se lazy initialization
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13164 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2515 +79
Lines 133233 137876 +4643
Branches 20636 21335 +699
============================================
+ Hits 82274 85713 +3439
- Misses 44911 45776 +865
- Partials 6048 6387 +339
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement @itschrispeck, and apologies for the newbie questions!
/** | ||
* Lazily initialize _valueToMatchingDocsMap, so that map generation is skipped when filtering excludes all values | ||
*/ | ||
private Map<String, RoaringBitmap> getValueToMatchingDocsMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, this isn't referring to the filtering done by the JsonExtractIndexTransformFunction
itself right (because that's only done after the map is generated in the JsonIndexReader
)? So is this referring to query level filters that might result in none of this transform function's transformToXyz
methods being called (and I presume the transform function's init
method is still called beforehand)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly!
try (JsonIndexCreator offHeapIndexCreator = new OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig()); | ||
MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new JsonIndexConfig())) { | ||
// build json indexes | ||
for (int i = 0; i < 1000000; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need such a large value here even though we're configuring the query killer's heap usage ratio to kill queries on to be 0.00f
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that the more frequent GCs in github runners make measuring thread memory difficult. When logs are enabled, I can see But all queries are below quota, no query killed
a couple times before the query is picked. So my assumption is that GCs are causing the thread memory measurements to be lower than the initial/previous thread memory measurements.
Locally, it works reliably at 100k but failed when I initially pushed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, that seems like a plausible explanation, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
try (JsonIndexCreator offHeapIndexCreator = new OffHeapJsonIndexCreator(indexDir, colName, new JsonIndexConfig()); | ||
MutableJsonIndexImpl mutableJsonIndex = new MutableJsonIndexImpl(new JsonIndexConfig())) { | ||
// build json indexes | ||
for (int i = 0; i < 1000000; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, that seems like a plausible explanation, thanks!
* Lazily initialize _valueToMatchingDocsMap, so that map generation is skipped when filtering excludes all values | ||
*/ | ||
private Map<String, RoaringBitmap> getValueToMatchingDocsMap() { | ||
if (_valueToMatchingDocsMap == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when this condition is true, does it mean the "filtering excludes all values"? Please clarify to match the method javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lazy initialization pattern, so the first time the method is called it will build the map. Subsequent calls will use the already built map. The pattern lets us avoid pre-building the map in transform function's init
method since we might never need it, instead it is built when transforming the first ValueBlock
This PR contains two changes:
json_extract_index
transform functionFor the first change, we've seen the map can be very large and sampling/interruption check previously could not occur during map generation.
For the second, lazy initialization skips map generation entirely when filters exclude all results which can save significant resources depending on the query.
UTs cover the logic, but also deployed in our prod instances and we do not see heap saturation from
json_extract_index
queries anymore.tags:
enhancement
performance
(?)