Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLP as a compressionCodec #12504

Merged
merged 39 commits into from
Mar 21, 2024
Merged

CLP as a compressionCodec #12504

merged 39 commits into from
Mar 21, 2024

Conversation

saurabhd336
Copy link
Contributor

Allows using CLP as a compression type for string columns. This hides all internal details of clp encoding, without having to expose the individual parts of a clp encoded log line externally. Giving a clean forwardIndex.getString() interface for the operators.

Enable using

"fieldConfigList": [
      {
        "name": "logLine",
        "encodingType": "RAW",
        "compressionCodec": "CLP",
      }
      

@saurabhd336 saurabhd336 changed the title Log data type CLP as a compressionCodec Feb 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 77.63158% with 85 lines in your changes are missing coverage. Please review.

Project coverage is 61.34%. Comparing base (59551e4) to head (78532fe).
Report is 150 commits behind head on master.

Files Patch % Lines
...index/readers/forward/CLPForwardIndexReaderV1.java 63.04% 34 Missing ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV1.java 90.65% 8 Missing and 2 partials ⚠️
.../realtime/impl/forward/CLPMutableForwardIndex.java 84.74% 9 Missing ⚠️
...impl/stats/StringColumnPreIndexStatsCollector.java 83.67% 5 Missing and 3 partials ⚠️
...ot/segment/local/io/util/VarLengthValueReader.java 0.00% 7 Missing ⚠️
...verter/stats/MutableNoDictionaryColStatistics.java 0.00% 4 Missing ⚠️
...he/pinot/segment/local/utils/TableConfigUtils.java 0.00% 2 Missing and 2 partials ⚠️
...gment/index/forward/ForwardIndexReaderFactory.java 40.00% 1 Missing and 2 partials ⚠️
...ment/index/forward/ForwardIndexCreatorFactory.java 0.00% 1 Missing and 1 partial ⚠️
.../local/segment/index/forward/ForwardIndexType.java 0.00% 1 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12504      +/-   ##
============================================
- Coverage     61.75%   61.34%   -0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2456      +20     
  Lines        133233   134240    +1007     
  Branches      20636    20762     +126     
============================================
+ Hits          82274    82356      +82     
- Misses        44911    45705     +794     
- Partials       6048     6179     +131     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 27.88% <73.94%> (-33.83%) ⬇️
java-21 61.34% <77.63%> (-0.29%) ⬇️
skip-bytebuffers-false 27.87% <73.94%> (-33.88%) ⬇️
skip-bytebuffers-true 61.34% <77.63%> (+33.61%) ⬆️
temurin 61.34% <77.63%> (-0.41%) ⬇️
unittests 61.34% <77.63%> (-0.41%) ⬇️
unittests1 45.97% <3.68%> (-0.92%) ⬇️
unittests2 27.88% <73.94%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@saurabhd336
Copy link
Contributor Author

cc: @kirkrodrigues @jackluo923 @npawar

Copy link
Contributor

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat! I'm no Pinot expert, so I can't comment on the fine-grained validity of the code, but it roughly makes sense to me. I added a few minor suggestions. I would also suggest renaming any new classes as ClpXXX rather than CLPXXX, so that the classes play better with serialization/deserialization libraries (if ever necessary).

Perhaps this is a silly question, but how can we support CLP's query logic (like clpMatch) on top of this?

logLines.add(
"2023/10/27 16:35:10.470 INFO [ControllerResponseFilter] [grizzly-http-server-2] Handled request from 0.0"
+ ".0.0 GET https://0.0.0.0:8443/health?checkType=liveness, content-type null status code 200 OK");
logLines.add(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include a test for a null row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines 129 to 130
// CLP is a special type of compression codec that isn't generally applicable to all RAW columns and has a
// special handling (see {@link CLPForwardIndexCreatorV1})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a special handling"

It sounds like this is an incomplete sentence? Did you mean to say "special handling for log events" or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

@@ -69,6 +71,10 @@ public static ForwardIndexCreator createIndexCreator(IndexCreationContext contex
} else {
// Dictionary disabled columns
DataType storedType = fieldSpec.getDataType().getStoredType();
if (indexConfig.getCompressionCodec() == FieldConfig.CompressionCodec.CLP) {
// CLP compression codec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems a little redundant with the if condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

if (_sealed) {
return _clpStatsCollector.getCLPStats();
}
throw new IllegalStateException("you must seal the collector first before asking for clp stats");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new IllegalStateException("you must seal the collector first before asking for clp stats");
throw new IllegalStateException("The collector must be sealed before calling getCLPStats");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@Override
public void close()
throws IOException {
// Delete all temp file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Delete all temp file
// Delete all temp files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@Jackie-Jiang Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Mar 21, 2024
import org.testng.annotations.Test;


public class CLPEncodingRealtimeIntegrationTest extends BaseClusterIntegrationTestSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel an integration test is a little bit overkilling here since all we want to verify is reading from the forward index. Suggest removing this one and using unit test to cover it

@@ -58,6 +58,7 @@ public ForwardIndexConfig(@Nullable Boolean disabled, @Nullable CompressionCodec
if (compressionCodec != null) {
switch (compressionCodec) {
case PASS_THROUGH:
case CLP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a hack. Should we count CLP as a raw value compression? We don't use dictionary encoding on the values

@saurabhd336 saurabhd336 merged commit 02d1c12 into apache:master Mar 21, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants