Skip to content

Query Frontend : dynamic step middleware #10958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Dhouib-Mohamed
Copy link

@Dhouib-Mohamed Dhouib-Mohamed commented Mar 21, 2025

What this PR does

This PR introduces a middleware for the Mimir frontend to dynamically optimize query step sizes for complex queries.

Problem Statement

Many queries from Grafana dashboards, especially those involving high-cardinality data, can take more than 10 seconds to execute. This is often due to:

  • Queries with a low step size (step) and a long time range, which significantly increase the amount of data fetched.
  • Users importing large dashboards without adjusting the step size to match their dataset's growth over time. Initially, when data volume was lower, these settings may have worked fine, but as data complexity increases, queries become slower.

Proposed Solution

This middleware detects "complex" queries—specifically, high-cardinality queries with a low step size—and dynamically adjusts the step parameter to improve query performance. By increasing the step size where appropriate, we can:

  • Reduce query execution time.
  • Improve dashboard responsiveness.
  • Allow users to retrieve more granular data on demand rather than by default.

Notes

The mathematical function and step-size limits used in this implementation are preliminary estimates and have not yet been validated in a high-load production environment. Further tuning and real-world testing will be required.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@Dhouib-Mohamed Dhouib-Mohamed requested review from tacole02 and a team as code owners March 21, 2025 11:37
@CLAassistant
Copy link

CLAassistant commented Mar 21, 2025

CLA assistant check
All committers have signed the CLA.

@Dhouib-Mohamed Dhouib-Mohamed changed the title Feat/dynamic step Query Frontend : dynamic step middleware Mar 21, 2025
@Dhouib-Mohamed
Copy link
Author

What are the next steps for this PR ?
I Can't merge since i am not a collaborator ( this is my first PR )
Merging is blocked You're not authorized to push to this branch. Visit https://docs.github.com/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches for more information.

@Dhouib-Mohamed Dhouib-Mohamed requested a review from tacole02 March 24, 2025 10:25
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR but changing the step of a range query contrary to the step included in the request is:

  • Surprising behavior that changes the response returned to users from what they would normally expect
  • Isn't allowed for Prometheus compatible APIs as far as I can tell.

Because of this, we're unlikely to merge this PR.

@dimitarvdimitrov
Copy link
Contributor

  • Reduce query execution time.

@Dhouib-Mohamed do you have any data on this? A lot of compute is spent regardless of the step. Some comparison between smaller and larger steps would help inform this decision.

@56quarters do you think we can keep this as an opt-in feature for operators to decide if they want it or do you think it doesn't have a place in Mimir?


if complexity > d.complexityThreshold {
// Calculate the new step based on the complexity
step = 100 * int64(10*math.Pow(complexity/d.complexityThreshold, 1.2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come up with this formula?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mainly created by try and refine , I started by reading the cardinality formula, trying some ideas , testing it with some examples with mimirir continious test ( tweaked it a little to generate some queries with different parameters ) and some manual tests but there is no direct logic to it

@Dhouib-Mohamed
Copy link
Author

About the data , I have none for now but i was planning to set up a test cluster with production metrics to test this functionality
But I need to see how likely is it to be considered since this is relatively time consuming

@56quarters
Copy link
Contributor

@56quarters do you think we can keep this as an opt-in feature for operators to decide if they want it or do you think it doesn't have a place in Mimir?

I'm leaning towards "no" because I think it makes the behavior of Mimir harder to understand.

It also has the potential to make performance worse. For example, Grafana dashboards adjust start and end times for range queries to be multiples of the step and we cache results based on that. If the step is being adjusted after Grafana has already picked something to help with caching, we run the risk of reducing our ability to cache queries.

I think the correct place for adjusting the step based on a heuristic is in Grafana, if it doesn't already do something like this.

@56quarters
Copy link
Contributor

I think the correct place for adjusting the step based on a heuristic is in Grafana, if it doesn't already do something like this.

Though I should point out, if the goal of this is to improve query performance I'd expect to see some numbers from a proof-of-concept that proves this is a valid approach to improving query performance before a PR to merge something to main.

@Dhouib-Mohamed
Copy link
Author

Sorry for the delay — I had to take care of a few other things. I've now completed the benchmarking and documented the results to assess the impact of this PR.
Let me know if you have any feedback!
Dynamic Step Middleware Performance Report (1).pdf

Dhouib-Mohamed and others added 2 commits April 17, 2025 13:29
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants