Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ArrayList instead of LinkedList in SortOperator #12783

Merged
merged 2 commits into from
Apr 25, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Apr 3, 2024

A small PR that changes SortOperator to buffer entries in an ArrayList instead of a LinkedList.

In general LinkedList performance is horrible, even in cases when theoretically (by Big-O) they are fine, usually the performance cost is worse than ArrayList due to memory amplification and cache issues. In the specific case this PR is changing, there was no actual reason to use LinkedList apart from a slightly nicer code. But given the size of the final result is well know, it was very easy to change it to an ArrayList implementation where the ArrayList is initialized to all values being null and then set values in the reverse direction. Amortized BigO cost is still linear, but locallity and allocation should be quite better in this implementation.

A secondary but difficult to prove improvement is related to megamorphic calls. As we mostly always use ArrayList in our code, the JIT can generate code that assumes that. Sometimes we need to use different lists, but if we can avoid that we theoretically can produce better code.

Probably the performance cost will be negligible in most cases and some of the actual cases (like the megamorphic calls) cannot be easilly benchmarked with JMH, so I'm not adding these benchmarks in this PR.

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.16%. Comparing base (59551e4) to head (153c3d6).
Report is 364 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12783      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2502      +66     
  Lines        133233   136586    +3353     
  Branches      20636    21145     +509     
============================================
+ Hits          82274    84908    +2634     
- Misses        44911    45401     +490     
- Partials       6048     6277     +229     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.12% <100.00%> (+0.41%) ⬆️
java-21 62.04% <100.00%> (+0.42%) ⬆️
skip-bytebuffers-false 62.15% <100.00%> (+0.40%) ⬆️
skip-bytebuffers-true 62.02% <100.00%> (+34.30%) ⬆️
temurin 62.16% <100.00%> (+0.41%) ⬆️
unittests 62.16% <100.00%> (+0.41%) ⬆️
unittests1 46.71% <100.00%> (-0.18%) ⬇️
unittests2 27.94% <0.00%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@siddharthteotia
Copy link
Contributor

Theoretically, I am aligned on leveraging ArrayList for memory efficiency and cache friendliness if we know size upfront.

Are you aware of any tools that can be used to test cache improvements ?

Have you done some tests for the memory part via VisualVM / Yourkit ?

PS - I am ok with the PR. Just curious in general about tools that can be used to test such PRs in future.

@gortiz
Copy link
Contributor Author

gortiz commented Apr 4, 2024

Are you aware of any tools that can be used to test cache improvements ?

We can easily create a JMH benchmark that tests list vs array list in this kind of code. I think it is not worth to create a JMH benchmark that test the operator itself because it would be quite complex.

I'm not aware of other tools, but each linked list node contains a reference to its value and a reference to the next node in the link. That means we need at least 8 bytes (assuming compressed pointers) for each node. In theory these nodes may be spread in the heap, although my experience tell me that they are almost always to be one after the other in the heap because we allocate them by the same thread with no allocations in the middle. Future GCs may move them, but sounds pretty unlikely.

While LinkedList requires 8 bytes per element that will probably but not for sure be contiguous in memory, Array list requires just 4 bytes per element that are going to be contiguous in memory for sure. Therefore cache usage is always better with ArrayList than with LinkedList.

@Jackie-Jiang Jackie-Jiang added the multi-stage Related to the multi-stage query engine label Apr 5, 2024
@gortiz
Copy link
Contributor Author

gortiz commented Apr 25, 2024

I've applied the suggested changes. Please take a look.

@Jackie-Jiang Jackie-Jiang merged commit fc98ce1 into apache:master Apr 25, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants