Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: use consumerDir during lucene realtime segment conversion #13094

Merged
merged 8 commits into from
May 11, 2024

Conversation

itschrispeck
Copy link
Collaborator

@itschrispeck itschrispeck commented May 7, 2024

Addresses #13039. Previously the consumers dir was derived from indexDir, which was not accurate in the case that pinot.server.instance.consumerDir was set. This passes consumerDir through to ensure the path is accurate in all cases.

Unrelated: also close Analyzer objects, and clarify some exception/log.

Testing: Unit tests cover, also sanity tested via quickstart w/ pinot.server.instance.consumerDir set.

tag: bugfix

@itschrispeck itschrispeck changed the title Consumers dir fix bugfix: use consumerDir during lucene realtime segment conversion May 7, 2024
LOGGER.warn("TEXT_MATCH query timeout on realtime consuming segment {}, column {}, search query {}", _segmentName,
_column, searchQuery);
throw new RuntimeException("TEXT_MATCH query timeout on realtime consuming segment");
LOGGER.warn("TEXT_MATCH query interrupted while querying the consuming segment {}, column {}, search query {}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this message to timeout due to some concerns in the original PR, but it's also used during early termination - and calling those queries timeouts is misleading.

@codecov-commenter
Copy link

codecov-commenter commented May 7, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 62.08%. Comparing base (59551e4) to head (69d0aeb).
Report is 432 commits behind head on master.

Files Patch % Lines
...ot/segment/spi/creator/SegmentGeneratorConfig.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13094      +/-   ##
============================================
+ Coverage     61.75%   62.08%   +0.32%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2514      +78     
  Lines        133233   137869    +4636     
  Branches      20636    21324     +688     
============================================
+ Hits          82274    85590    +3316     
- Misses        44911    45883     +972     
- Partials       6048     6396     +348     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 27.77% <75.00%> (-33.94%) ⬇️
java-21 35.23% <17.85%> (-26.40%) ⬇️
skip-bytebuffers-false 62.04% <92.85%> (+0.30%) ⬆️
skip-bytebuffers-true 35.21% <17.85%> (+7.48%) ⬆️
temurin 62.08% <92.85%> (+0.32%) ⬆️
unittests 62.07% <92.85%> (+0.32%) ⬆️
unittests1 46.68% <20.83%> (-0.21%) ⬇️
unittests2 27.77% <75.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -106,7 +106,7 @@ public static HashSet<String> getDefaultEnglishStopWordsSet() {
* @param config the text index config
*/
public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, boolean realtimeConversion,
@Nullable int[] immutableToMutableIdMap, TextIndexConfig config) {
File consumerDir, @Nullable int[] immutableToMutableIdMap, TextIndexConfig config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add @Nullable before File consumerDir.

@chenboat
Copy link
Contributor

chenboat commented May 7, 2024

Since this PR is a bug fix, can you explain in a high level what is the main issue causing the problem and how you fix it?

@@ -121,6 +121,7 @@ public enum TimeColumnType {
private boolean _optimizeDictionaryForMetrics = false;
private double _noDictionarySizeRatioThreshold = IndexingConfig.DEFAULT_NO_DICTIONARY_SIZE_RATIO_THRESHOLD;
private boolean _realtimeConversion = false;
private File _consumerDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a javadoc to explain what this directory is about.

Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the hotfix!

@@ -216,6 +217,7 @@ public boolean isMutableSegment() {
_partitionFunction = config.getPartitionFunction();
_mainPartitionId = config.getPartitionId();
_nullHandlingEnabled = config.isNullHandlingEnabled();
_consumerDir = config.getConsumerDir() != null ? new File(config.getConsumerDir()) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be null? Currently the abstraction is not very clear, and I don't really see proper null handling. Can we ensure it is always set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, it can be null in the MemoryEstimator path. I think it's probably clearer to set a dummy dir there, and remove the @Nullable annotation for consumerDir

For the realtime flow RealtimeTableDataManager.getConsumerDir() ensures it will not be null.

@itschrispeck itschrispeck force-pushed the consumers_dir_fix branch 2 times, most recently from 2cdeb5d to 07d8500 Compare May 8, 2024 02:35
@@ -78,7 +78,7 @@ public RealtimeLuceneTextIndex(String column, File segmentIndexDir, String segme
// for realtime
_indexCreator =
new LuceneTextIndexCreator(column, new File(segmentIndexDir.getAbsolutePath() + "/" + segmentName),
false /* commitOnClose */, true, null, config);
false /* commitOnClose */, true, null, null, config);
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you are passing in null as consumerDir here. Do we have a test covering this path? I'd imagine it should throw NPE

Copy link
Collaborator Author

@itschrispeck itschrispeck May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do - it's covered in LuceneMutableTextIndexTest. consumerDir isn't used for the realtime index, and is only used by the LuceneTextIndexCreator initialized during conversion of the realtime segment. Added a comment clarifying this

@Jackie-Jiang Jackie-Jiang merged commit fe8d395 into apache:master May 11, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants