-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user configurable regex library for queries #13005
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13005 +/- ##
============================================
+ Coverage 61.75% 62.20% +0.44%
+ Complexity 207 198 -9
============================================
Files 2436 2521 +85
Lines 133233 137901 +4668
Branches 20636 21327 +691
============================================
+ Hits 82274 85776 +3502
- Misses 44911 45734 +823
- Partials 6048 6391 +343
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Why not just replace the implementation to keep the interface simple and easy to use? If Java's built in regular expressions have the edge somewhere, it's probably a pretty small edge, and it would be nice to avoid backtracking by default. |
I don't have a good understanding of how Pinot's users use regex, but from searching it seems that the median latency/memory usage of re2j is higher than Java's built in library. It'd also become a backwards incompatible change due to feature gaps. In general I don't think users would be primarily using Pinot for regex filtering (i.e. should we have an optimized path for the average case or focus on worst case performance?), but that's my conjecture. Some references: If there is a consensus that an outright replacement is acceptable I'm happy to go that route. |
+1 on making this configurable in the first version. And this change, it will be easier for us to benchmark with different library. There are some known cases re2j doesn't outperform the current implementation, so it's a good idea to on hold the replacement of the existing library. We can do another thorough analysis on when re2j is more efficient and when it's not, and make a proposal on the default behavior of regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature to add to Pinot.
* allow user configurable regex library * fix import ordering
This PR addresses #12628
This provides functionality to configure a regex library that will be used during query execution.
PatternFactory.compile(...)
is used in place ofPattern.compile(...)
Pattern
interface added maintains the same semantics asjava.util.Pattern
Matcher
interface added maintains the same semantics asjava.util.Matcher
A new config is added:
pinot.server.query.regex.class
Valid values for this config are:
JAVA_UTIL
orRE2J
(it might be better to accept class names instead?).Some testing showed regex libraries have very different strengths/weaknesses, so it seemed best to allow users to choose an implementation that works well for them.
Default behavior is unchanged.
tag:
feature