-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config to skip record ingestion on string column length exceeding configured max schema length #13103
Add config to skip record ingestion on string column length exceeding configured max schema length #13103
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13103 +/- ##
============================================
+ Coverage 61.75% 62.14% +0.39%
+ Complexity 207 198 -9
============================================
Files 2436 2536 +100
Lines 133233 139488 +6255
Branches 20636 21566 +930
============================================
+ Hits 82274 86681 +4407
- Misses 44911 46311 +1400
- Partials 6048 6496 +448
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
0375dab
to
17f2224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snleee @swaminathanmanish Can you help also review this?
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tibrewalpratik17 for the PR, overall LGTM. Left a minor comment.
...al/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
Show resolved
Hide resolved
...al/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tibrewalpratik17 for addressing the review comments. LGTM.
00b6917
to
016d6ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest extending this to make it applicable to BYTES and JSON as well
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
I see #13181 already covering JSON scenario. |
0bbd7de
to
9183264
Compare
9183264
to
e990348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall direction looks good!
If we allow maxLengthExceedStrategy
to be null
, a lot of test changes can be simplified
pinot-common/src/test/java/org/apache/pinot/common/data/FieldSpecTest.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
50ceb6d
to
97c0167
Compare
Yes this simplifies the PR a lot! Updated accordingly! Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. Let's also add support for BYTES and JSON since we have all the required strategy supported
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
3b24d12
to
2c3c923
Compare
2c3c923
to
01b43e6
Compare
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
case BYTES: | ||
maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() == null | ||
? FieldSpec.MaxLengthExceedStrategy.NO_ACTION : fieldSpec.getMaxLengthExceedStrategy(); | ||
_columnToColumnInfoMap.put(fieldSpec.getName(), new SanitizedColumnInfo(fieldSpec.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to include the column when it is NO_ACTION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of string do we want to ingest NULL_CHARACTER
in case of NO_ACTION
?
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java
Outdated
Show resolved
Hide resolved
bda8321
to
266eba7
Compare
266eba7
to
e2a63f8
Compare
hey @Jackie-Jiang addressed comments! Can you please help with the review? |
… configured max schema length (apache#13103)
label:
ingestion
feature
release-notes
maxLengthExceedStrategy
. Default value is to TRIM_LENGTH to ensure backward compatibility. If configured to beERROR
, in case of incoming-string-value exceeding the configured max length, it will skip the record, return error and log accordingly. If set asSUBSTITUTE_DEFAULT_VALUE
, it will substitute the configured default value in place of the huge string. There is another strategy asNO_ACTION
which justs ingests the string.Sample FieldSpec: