-
Notifications
You must be signed in to change notification settings - Fork 580
Add query limiter middleware for instant queries #11097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This runs once in each query-frontend. So in effect the rate limit will scale as the query-frontend deployment scales. Any thoughts on doing this in the query-scheduler where we have fewer replicas? Or perhaps using memcached for this? |
I'll look into using memcached for this. I think it makes more sense than implementing at the query-scheduler; we want to block requests before they're split, sharded, etc., and it's not intuitive that a block like this is the responsibility of the scheduler anyways. We do other blocking at the frontend. |
bf01f5e
to
9c13691
Compare
75b9cb9
to
95ea26a
Compare
Hey, sorry to jump into the middle of this PR but this looks like it's reading and writing a structure to cache. I was hoping we could consider some approaches that are a little less race-prone. Option 1: If the requirement is only run a particular query every Option 2: If the requirement is to only run a particular query N times every minute, we could use a key with 60 second buckets in it and Both of these would only require a single operation against the cache and would avoid races (AFAICT) in our code. |
@56quarters no need to apologize! These are good suggestions, thanks for the feedback. I'll work on a change next week. |
95ea26a
to
d92c579
Compare
1ae3aaa
to
31c59df
Compare
577a319
to
d0b2cd0
Compare
d0b2cd0
to
9b8dd8e
Compare
What this PR does
This PR introduces a middleware (modeled after the query blocker and error caching middlewares) which checks if a query should not be run more than a configured
AllowedFrequency
. If the last time the query was allowed less thanAllowedFrequency
time ago, it is blocked with aquery-limited
error, and returns a 429 status. It works with the range query, instant query, and remote read paths, though this PR only adds it to the instant query middlewares, since frequent, expensive instant queries can easily overwhelm the system due to the lack of results caching on that path.Limitations:
I'm not 100% convinced that 429 is the appropriate status code to return, but it is a retryable error.
Which issue(s) this PR fixes or relates to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.