Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metric rule pattern regex #12856

Merged
merged 3 commits into from
Apr 16, 2024

Conversation

shounakmk219
Copy link
Collaborator

Description

The . requires escaping throughout the pattern string if it is intended to represent a . literal. The only reason its working right now is because its surrounded by specific patterns, so the match all . regex is only matching actual . character.
More discussion around this issue (with an example where it breaks) here.

This PR tries to fix this by escaping all . literals.
It also replaces *? with * for custom metric rules as the surrounding pattern matching is well defined.

labels

bugfix refactor

@zhtaoxiang
Copy link
Contributor

It also replaces *? with * for custom metric rules as the surrounding pattern matching is well defined.

Can you please elaborate more on this? Maybe you can use an example to explain why it is needed?

@@ -1,173 +1,173 @@
rules:
- pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(([^\\.]+)\\.)?([^\\.]*?).authorization\"><>(\\w+)"
- pattern: "\"org\\.apache\\.pinot\\.common\\.metrics\"<type=\"BrokerMetrics\", name=\"pinot\\.broker\\.(([^\\.]+)\\.)?([^\\.]*)\\.authorization\"><>(\\w+)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this follow java regex? If so [^\\.] should be equivalent to [^.]
Can you share the documentation for the prometheus config? Do all fields take regex as input?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just validated both metrics with [^\\.] and [^.], you are right both are equivalent. Will make that update too.
Regarding the documentation for the prometheus config here is a link I referred.
According to the doc, only pattern field takes a regex while capture groups from pattern can be used in labels.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@shounakmk219
Copy link
Collaborator Author

It also replaces *? with * for custom metric rules as the surrounding pattern matching is well defined.

Can you please elaborate more on this? Maybe you can use an example to explain why it is needed?

@zhtaoxiang
As @Jackie-Jiang pointed out in point 3 here,
Let's say a metric is of the format somePrefix.tableName.someMetric
Case 1 using *? :
pattern = somePrefix\\.([^.]*?)\\.someMetric
The parser will do conservative iterations of

1 . somePrefix.t -> fails as . is not encountered after it
2 . somePrefix.ta -> fails as . is not encountered after it
.
.
n . somePrefix.tableName -> success as . is encountered after it

Case 1 using * :
pattern = somePrefix\\.([^.]*)\\.someMetric
The parser will match somePrefix.tableName in the first iteration itself as its greedy and only breaks at the . in .someMetric

@zhtaoxiang
Copy link
Contributor

It also replaces *? with * for custom metric rules as the surrounding pattern matching is well defined.

Can you please elaborate more on this? Maybe you can use an example to explain why it is needed?

@zhtaoxiang As @Jackie-Jiang pointed out in point 3 here, Let's say a metric is of the format somePrefix.tableName.someMetric Case 1 using *? : pattern = somePrefix\\.([^.]*?)\\.someMetric The parser will do conservative iterations of

1 . somePrefix.t -> fails as . is not encountered after it 2 . somePrefix.ta -> fails as . is not encountered after it . . n . somePrefix.tableName -> success as . is encountered after it

Case 1 using * : pattern = somePrefix\\.([^.]*)\\.someMetric The parser will match somePrefix.tableName in the first iteration itself as its greedy and only breaks at the . in .someMetric

I get the log for *?. For Case 1 using *, if it's greedy, I feel * should match tableName.someMetric first, then tableName.someMetri, then tableName.someMetr.

If my understanding is correct, isn't greedy matching slower? Because in most cases, the suffix will be much longer than the prefix, so greedy is slower. For example, if the pattern is somePrefix\\.([^.]*?)\\.someMetric\\.suffix1\\.suffix2\\.suffix3, and the metric is somePrefix.sometable.someMetric.suffix1.suffix2.suffix3, we need to try 11 times; if the pattern is somePrefix\\.([^.]*)\\.someMetric\\.suffix1\\.suffix2\\.suffix3, we need to try 25 times.

@shounakmk219
Copy link
Collaborator Author

For Case 1 using *, if it's greedy, I feel * should match tableName.someMetric first, then tableName.someMetri, then tableName.someMetr.

@zhtaoxiang ([^.]*) won't match tableName.someMetric as the matching will break at . (based on the regex).

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @zhtaoxiang let me know if you have concern

@Jackie-Jiang Jackie-Jiang merged commit d4cb93d into apache:master Apr 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants