-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLP as a compressionCodec #12504
CLP as a compressionCodec #12504
Conversation
* Refactoring the upsert compaction related code 1. Fix the issue with fetching validDocId metadata for table with a large number of segments. (Added POST API with list of segments to be part of the request body) 2. Added POST support for MultiHttpRequest to cover 1. 3. Added GET /tables/<tableName>/validDocIdMetadata API on the controller for improving debuggability. * Addressing comments
0251131
to
fea9fdb
Compare
fea9fdb
to
835559e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12504 +/- ##
============================================
- Coverage 61.75% 61.34% -0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2456 +20
Lines 133233 134240 +1007
Branches 20636 20762 +126
============================================
+ Hits 82274 82356 +82
- Misses 44911 45705 +794
- Partials 6048 6179 +131
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
6b0be01
to
c81fa0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat! I'm no Pinot expert, so I can't comment on the fine-grained validity of the code, but it roughly makes sense to me. I added a few minor suggestions. I would also suggest renaming any new classes as ClpXXX
rather than CLPXXX
, so that the classes play better with serialization/deserialization libraries (if ever necessary).
Perhaps this is a silly question, but how can we support CLP's query logic (like clpMatch) on top of this?
logLines.add( | ||
"2023/10/27 16:35:10.470 INFO [ControllerResponseFilter] [grizzly-http-server-2] Handled request from 0.0" | ||
+ ".0.0 GET https://0.0.0.0:8443/health?checkType=liveness, content-type null status code 200 OK"); | ||
logLines.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include a test for a null row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
// CLP is a special type of compression codec that isn't generally applicable to all RAW columns and has a | ||
// special handling (see {@link CLPForwardIndexCreatorV1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a special handling"
It sounds like this is an incomplete sentence? Did you mean to say "special handling for log events" or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
@@ -69,6 +71,10 @@ public static ForwardIndexCreator createIndexCreator(IndexCreationContext contex | |||
} else { | |||
// Dictionary disabled columns | |||
DataType storedType = fieldSpec.getDataType().getStoredType(); | |||
if (indexConfig.getCompressionCodec() == FieldConfig.CompressionCodec.CLP) { | |||
// CLP compression codec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems a little redundant with the if condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
if (_sealed) { | ||
return _clpStatsCollector.getCLPStats(); | ||
} | ||
throw new IllegalStateException("you must seal the collector first before asking for clp stats"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new IllegalStateException("you must seal the collector first before asking for clp stats"); | |
throw new IllegalStateException("The collector must be sealed before calling getCLPStats"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@Override | ||
public void close() | ||
throws IOException { | ||
// Delete all temp file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Delete all temp file | |
// Delete all temp files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
import org.testng.annotations.Test; | ||
|
||
|
||
public class CLPEncodingRealtimeIntegrationTest extends BaseClusterIntegrationTestSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel an integration test is a little bit overkilling here since all we want to verify is reading from the forward index. Suggest removing this one and using unit test to cover it
@@ -58,6 +58,7 @@ public ForwardIndexConfig(@Nullable Boolean disabled, @Nullable CompressionCodec | |||
if (compressionCodec != null) { | |||
switch (compressionCodec) { | |||
case PASS_THROUGH: | |||
case CLP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a hack. Should we count CLP as a raw value compression? We don't use dictionary encoding on the values
Allows using CLP as a compression type for string columns. This hides all internal details of clp encoding, without having to expose the individual parts of a clp encoded log line externally. Giving a clean forwardIndex.getString() interface for the operators.
Enable using