-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Heap Usage of OnHeapStringDictionary #12223
Conversation
cbca843
to
df697a8
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12223 +/- ##
============================================
+ Coverage 61.50% 61.57% +0.06%
- Complexity 207 1151 +944
============================================
Files 2416 2419 +3
Lines 131179 131295 +116
Branches 20246 20266 +20
============================================
+ Hits 80686 80840 +154
+ Misses 44595 44548 -47
- Partials 5898 5907 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
8a12af0
to
c7a4502
Compare
c7a4502
to
1e08648
Compare
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexingConfig.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Outdated
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java
Outdated
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java
Outdated
Show resolved
Hide resolved
...st/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexTypeTest.java
Outdated
Show resolved
Hide resolved
This is a very nice feature! Nice contribution @vvivekiyer! I have some comments to add: The concept itselfI understand that the issue was detected in on heap dictionaries, but I think we can use this approach also in offheap dictionaries as well. Specifically, interning could be applied when the value is read from the dictionary. The configI find this config too repetitive. Instead of: {
"indexes": {
"dictionary": {
"onHeap": true,
"useVarLengthDictionary": true,
"onHeapConfig": {
"enableInterning":true,
"internerCapacity":32000000
}
}
}
} I would suggest something like: {
"indexes": {
"dictionary": {
"onHeap": true,
"useVarLengthDictionary": true,
"intern": {
"capacity":32000000
}
}
}
} With an optional implicit field I would use this approach even if we do not support interning in offheap dictionaries (which is something we may decide to change in future). This approach is simpler and easier to read. Support in older syntaxAs said in the comments, I recommend against adding new features in the old syntax (in this case, in indexingConfig). For compatibility reasons Users can migrate to the new syntax in case they want to use new features. The translation can even be done automatically. Is not like we want to force people to use the new syntax, but the index config logic is already too complex and 2 ways to configure each new feature will make it even more complex. |
37896ba
to
2ad5ff9
Compare
2ad5ff9
to
cbb556e
Compare
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/DictionaryIndexConfig.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryInternerHolder.java
Outdated
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/test/java/org/apache/pinot/spi/config/table/IndexingConfigTest.java
Show resolved
Hide resolved
Approved, but it would be great to add a JMH benchmark. I would expect that in case we have two segments with dictionary enabled and large strings, a query that groups by them should be quite faster. |
7cea79d
to
d8af5e2
Compare
pinot-common/src/test/java/org/apache/pinot/common/utils/FALFInternerTest.java
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/FALFInternerTest.java
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/FALFInternerTest.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/DictionaryIndexConfig.java
Show resolved
Hide resolved
Haven't gone through all the tests yet so asking if we have covered the following ?
|
(nit) Please pretty print the sample config used in PR description for readability |
How does the concurrency aspect come into picture during query processing and segment reloads ? |
@somandal - can you also help take a look if you get a chance ? |
This PR corresponds to issue 12078
Heap Usage Before Interning
Heap Usage After Interning