-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add isJson UDF #12603
Add isJson UDF #12603
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12603 +/- ##
============================================
+ Coverage 61.75% 61.77% +0.02%
+ Complexity 207 198 -9
============================================
Files 2436 2453 +17
Lines 133233 133828 +595
Branches 20636 20753 +117
============================================
+ Hits 82274 82676 +402
- Misses 44911 45043 +132
- Partials 6048 6109 +61
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
c2a4f27
to
4a55b7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tibrewalpratik17 for the PR, left a comment for the clarification.
|
||
/** | ||
* Checks whether the input string can be parsed into a json node or not. Useful for scenarios where we want | ||
* to filter out malformed json. In case of nulls we return null itself, as null can be treated as valid json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement does not seem to be aligned with what is implemented and merged here. If the argument is null, it returns false as JsonUtils.stringToJsonNode(inputStr);
throws an Exception.
The discussion in the comment thread is also confusing to what is finally added here. It is good to add the conclusion in that thread for readers if it is decided based on offline discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tibrewalpratik17 We may update the javadoc. null
values are handled by the caller (function invoker), instead of within this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
* in partial-upsert scenarios. Upto the user to use null response accordingly. | ||
* | ||
* @param inputStr Input string to test for valid json | ||
* @return in case of null value, it returns null. In case of non-null, it returns true in case of valid json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier for a null argument.
label:
feature
WHY ARE WE ADDING THIS?
skipInvalidJson
in json-index config introduced in skip invalid json string rather than throwing error during json indexing #12238 then the ingestion stops in case of malformed / truncated json.skipInvalidJson
in json-index, then ingestion in not stopped but the record becomes un-queryable with json_match queries anyways. We also end up persisting malformed data in the table which can potentially cause that particular primary-key to not come in query response at all (if using upsert). This is again an issue. Additionally, there is no log / metric to track such scenarios whereskipInvalidJson
got applied.Adding a json-validator as a UDF so that we can use it in filterTransform config, add alerts on
NUMBER_ROWS_FILTERED
metric and using this #12602 immediately track the events which are malformed / truncated.UDFS IN OTHER DBS
Saw few DBs have similar relatable functionalities too.
Did not find anything similar in Postgresql though.
Null value
Putting nulls as a valid json here as during partial-upserts we are okay with having nulls. They will be overwritten.
But open to suggestions on how can we handle nulls differently.