Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi stage metrics #13035

Merged
merged 2 commits into from
Jun 7, 2024
Merged

Multi stage metrics #13035

merged 2 commits into from
Jun 7, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Apr 30, 2024

This PR adds some multi-stage metrics. The whole PR is built on top of #12704 so it should only be merged once (and if) that other PR is merged.

All new metrics are server based. Specifically, they are:

  • Meters:

    • HASH_JOIN_TIMES_MAX_ROWS_REACHED (global): How many times the max number of rows per join has been reached. If this happens multiple times in a single query, the meter will be increased as many times as this limit is reached.
    • AGGREGATE_TIMES_NUM_GROUPS_LIMIT_REACHED (global): How many times the max number of groups has been reached. If this happens multiple times in a single query, the meter will be increased as many times as this limit is reached.
    • MAX_ROWS_IN_WINDOW_REACHED (global): How many times the max number of elements in window has been reached. If this happens multiple times in a single query, the meter will be increased as many times as the limit is reached.
    • MULTI_STAGE_IN_MEMORY_MESSAGES (global): How many messages have been sent from one stage to the other without being serialized. Optimizations like collocated joins can increase this number.
    • MULTI_STAGE_RAW_MESSAGES (global): How many times messages have been sent from one stage to the other in serialized way. Optimizations like collocated joins can reduce this number.
    • MULTI_STAGE_RAW_BYTES (global): How many bytes have been sent from one stage to the other. The smaller this number, the better.
  • Timers

    • HASH_JOIN_CPU_TIME_BUILDING_HASH_TABLE_MS (global): The CPU time (in ms) spent building the hash table on each hash join.
    • MULTI_STAGE_SERIALIZATION_CPU_TIME_MS (global): The CPU time (in ms) spent when messages are being serialized from one stage to the other.
    • MULTI_STAGE_DESERIALIZATION_CPU_TIME_MS (global): The CPU time (in ms) spent when messages are being deserialized after being received by other stage.
    • RECEIVE_DOWNSTREAM_CPU_TIME_MS (global): The CPU time (in ms) spent by sender mailboxes waiting the receiver (upstream) to read the messages.
    • RECEIVE_UPSTREAM_CPU_WAIT_MS (global): The CPU time (in ms) spent by receiver mailboxes waiting for new data from the sender (upstream).

@gortiz gortiz force-pushed the multi-stage-metrics branch 2 times, most recently from 44017a0 to 14d0b58 Compare June 5, 2024 08:27
@codecov-commenter
Copy link

codecov-commenter commented Jun 5, 2024

Codecov Report

Attention: Patch coverage is 89.18919% with 8 lines in your changes missing coverage. Please review.

Project coverage is 62.05%. Comparing base (59551e4) to head (58676cb).
Report is 564 commits behind head on master.

Files Patch % Lines
...not/query/runtime/operator/MultiStageOperator.java 89.47% 2 Missing and 2 partials ⚠️
...ot/query/runtime/operator/MailboxSendOperator.java 86.66% 1 Missing and 1 partial ⚠️
...pinot/query/runtime/plan/MultiStageQueryStats.java 75.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13035      +/-   ##
============================================
+ Coverage     61.75%   62.05%   +0.30%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2544     +108     
  Lines        133233   139740    +6507     
  Branches      20636    21611     +975     
============================================
+ Hits          82274    86720    +4446     
- Misses        44911    46532    +1621     
- Partials       6048     6488     +440     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.02% <89.18%> (+0.31%) ⬆️
java-21 61.94% <89.18%> (+0.32%) ⬆️
skip-bytebuffers-false 62.04% <89.18%> (+0.29%) ⬆️
skip-bytebuffers-true 61.92% <89.18%> (+34.20%) ⬆️
temurin 62.05% <89.18%> (+0.30%) ⬆️
unittests 62.05% <89.18%> (+0.30%) ⬆️
unittests1 46.61% <89.18%> (-0.28%) ⬇️
unittests2 27.73% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good

}
}
}
}

public List<StageStats.Closed> getClosedStats() {
return Collections.unmodifiableList(_closedStats);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is internal only code, we can skip wrapping into unmodifiable to reduce overhead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unmodifiable list is very light (just an indirection) and TBH I prefer to be a bit paranoic here so we don't end up modifying it in the future, which would be difficult to catch.

This method is only called once per query on the broker. I don't think creating a single small object (that can be probably scalar replaced) would be problematic.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Jackie-Jiang Jackie-Jiang merged commit 0f92742 into apache:master Jun 7, 2024
20 checks passed
@gortiz gortiz deleted the multi-stage-metrics branch June 7, 2024 12:46
gortiz added a commit to gortiz/pinot that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation metrics multi-stage Related to the multi-stage query engine observability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants