lucene IndexOutOfBounds
bugfix, and use NRTCachingDirectory for realtime segment
#13308
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains two changes:
The first is a symptom of a crash post .commit(), as leftover Lucene index files are appended to as opposed to overwritten . I had seen
IndexOutOfBounds
exception is caused bymappingBuffer.getInt(luceneDocId)
, but the mapping file is loaded in range[0, numDocsFromSegment * 4 bytes]
. Therefore, if Lucene index contains duplicates, it's an eventuality that we'll try togetInt
for aluceneDocId
that is larger thannumDocsFromSegment
, causing the exception.For the second, NRT functionality is beneficial when refresh rate is high as it results in many tiny files being written. This allows for a configurable in memory buffer to cache these small writes and avoid many small files/high FDs.
Tested in an internal cluster.
suggested tags:
bugfix
,enhancement