-
Notifications
You must be signed in to change notification settings - Fork 581
Query Frontend : dynamic step middleware #10958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
What are the next steps for this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR but changing the step of a range query contrary to the step included in the request is:
- Surprising behavior that changes the response returned to users from what they would normally expect
- Isn't allowed for Prometheus compatible APIs as far as I can tell.
Because of this, we're unlikely to merge this PR.
@Dhouib-Mohamed do you have any data on this? A lot of compute is spent regardless of the step. Some comparison between smaller and larger steps would help inform this decision. @56quarters do you think we can keep this as an opt-in feature for operators to decide if they want it or do you think it doesn't have a place in Mimir? |
|
||
if complexity > d.complexityThreshold { | ||
// Calculate the new step based on the complexity | ||
step = 100 * int64(10*math.Pow(complexity/d.complexityThreshold, 1.2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you come up with this formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mainly created by try and refine , I started by reading the cardinality formula, trying some ideas , testing it with some examples with mimirir continious test ( tweaked it a little to generate some queries with different parameters ) and some manual tests but there is no direct logic to it
About the data , I have none for now but i was planning to set up a test cluster with production metrics to test this functionality |
I'm leaning towards "no" because I think it makes the behavior of Mimir harder to understand. It also has the potential to make performance worse. For example, Grafana dashboards adjust start and end times for range queries to be multiples of the step and we cache results based on that. If the step is being adjusted after Grafana has already picked something to help with caching, we run the risk of reducing our ability to cache queries. I think the correct place for adjusting the step based on a heuristic is in Grafana, if it doesn't already do something like this. |
Though I should point out, if the goal of this is to improve query performance I'd expect to see some numbers from a proof-of-concept that proves this is a valid approach to improving query performance before a PR to merge something to main. |
7c663b4
to
693f660
Compare
Sorry for the delay — I had to take care of a few other things. I've now completed the benchmarking and documented the results to assess the impact of this PR. |
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
What this PR does
This PR introduces a middleware for the Mimir frontend to dynamically optimize query step sizes for complex queries.
Problem Statement
Many queries from Grafana dashboards, especially those involving high-cardinality data, can take more than 10 seconds to execute. This is often due to:
Proposed Solution
This middleware detects "complex" queries—specifically, high-cardinality queries with a low step size—and dynamically adjusts the step parameter to improve query performance. By increasing the step size where appropriate, we can:
Notes
The mathematical function and step-size limits used in this implementation are preliminary estimates and have not yet been validated in a high-load production environment. Further tuning and real-world testing will be required.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.