-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: use consumerDir during lucene realtime segment conversion #13094
Conversation
LOGGER.warn("TEXT_MATCH query timeout on realtime consuming segment {}, column {}, search query {}", _segmentName, | ||
_column, searchQuery); | ||
throw new RuntimeException("TEXT_MATCH query timeout on realtime consuming segment"); | ||
LOGGER.warn("TEXT_MATCH query interrupted while querying the consuming segment {}, column {}, search query {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this message to timeout
due to some concerns in the original PR, but it's also used during early termination - and calling those queries timeouts is misleading.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13094 +/- ##
============================================
+ Coverage 61.75% 62.08% +0.32%
+ Complexity 207 198 -9
============================================
Files 2436 2514 +78
Lines 133233 137869 +4636
Branches 20636 21324 +688
============================================
+ Hits 82274 85590 +3316
- Misses 44911 45883 +972
- Partials 6048 6396 +348
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -106,7 +106,7 @@ public static HashSet<String> getDefaultEnglishStopWordsSet() { | |||
* @param config the text index config | |||
*/ | |||
public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, boolean realtimeConversion, | |||
@Nullable int[] immutableToMutableIdMap, TextIndexConfig config) { | |||
File consumerDir, @Nullable int[] immutableToMutableIdMap, TextIndexConfig config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add @Nullable
before File consumerDir
.
Since this PR is a bug fix, can you explain in a high level what is the main issue causing the problem and how you fix it? |
@@ -121,6 +121,7 @@ public enum TimeColumnType { | |||
private boolean _optimizeDictionaryForMetrics = false; | |||
private double _noDictionarySizeRatioThreshold = IndexingConfig.DEFAULT_NO_DICTIONARY_SIZE_RATIO_THRESHOLD; | |||
private boolean _realtimeConversion = false; | |||
private File _consumerDir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a javadoc to explain what this directory is about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making the hotfix!
@@ -216,6 +217,7 @@ public boolean isMutableSegment() { | |||
_partitionFunction = config.getPartitionFunction(); | |||
_mainPartitionId = config.getPartitionId(); | |||
_nullHandlingEnabled = config.isNullHandlingEnabled(); | |||
_consumerDir = config.getConsumerDir() != null ? new File(config.getConsumerDir()) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever be null
? Currently the abstraction is not very clear, and I don't really see proper null handling. Can we ensure it is always set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, it can be null in the MemoryEstimator
path. I think it's probably clearer to set a dummy dir there, and remove the @Nullable
annotation for consumerDir
For the realtime flow RealtimeTableDataManager.getConsumerDir()
ensures it will not be null.
2cdeb5d
to
07d8500
Compare
07d8500
to
79e3352
Compare
@@ -78,7 +78,7 @@ public RealtimeLuceneTextIndex(String column, File segmentIndexDir, String segme | |||
// for realtime | |||
_indexCreator = | |||
new LuceneTextIndexCreator(column, new File(segmentIndexDir.getAbsolutePath() + "/" + segmentName), | |||
false /* commitOnClose */, true, null, config); | |||
false /* commitOnClose */, true, null, null, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you are passing in null
as consumerDir
here. Do we have a test covering this path? I'd imagine it should throw NPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do - it's covered in LuceneMutableTextIndexTest
. consumerDir
isn't used for the realtime index, and is only used by the LuceneTextIndexCreator initialized during conversion of the realtime segment. Added a comment clarifying this
Addresses #13039. Previously the consumers dir was derived from indexDir, which was not accurate in the case that
pinot.server.instance.consumerDir
was set. This passes consumerDir through to ensure the path is accurate in all cases.Unrelated: also close
Analyzer
objects, and clarify some exception/log.Testing: Unit tests cover, also sanity tested via quickstart w/
pinot.server.instance.consumerDir
set.tag:
bugfix