-
Notifications
You must be signed in to change notification settings - Fork 580
Query Frontend: add cortex_query_samples_processed_cache_adjusted_total metric #11164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Query Frontend: add cortex_query_samples_processed_cache_adjusted_total metric #11164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! I just left a couple of nits. 👏
@@ -67,6 +67,13 @@ message Extent { | |||
// If the response is combination of multiple queries over time, all of which had timestamp set, this is the timestamp of oldest query. | |||
// When merging extents and some of them have 0 query timestamp, we keep non-zero timestamp, if possible. | |||
int64 query_timestamp_ms = 6; | |||
// Number of samples processed per step to create this Extent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sort it by timestamp by contract? If so, I would document it here, given it's a design decision.
return result | ||
} | ||
|
||
func mergeSamplesProcessedPerStep(steps ...[]StepStat) []StepStat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we simply define that StepStat
must be ordered by timestamp, then you don't need to use a map here. You can just go building the merged slice in a single pass. Keep it simple, just take 2 inputs. Don't need to be variadic.
if details := QueryDetailsFromContext(ctx); details != nil { | ||
details.SamplesProcessedFromCache = cachedSamplesProcessed | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design-wise, I don't think it's partitionCacheExtents()
responsibility taking the context in input and updating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks. Lifted stats updates up.
cachedSamplesProcessed += uint64(stepStat.Value) | ||
} | ||
if details := QueryDetailsFromContext(ctx); details != nil { | ||
details.SamplesProcessedFromCache = cachedSamplesProcessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused. partitionCacheExtents()
can be called multiple times for a single input query. Here we overwrite SamplesProcessedFromCache
each time 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, my bad; thanks for catching it. Pushed a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic LGTM.
It tracks oriignal number of samples used to populate the cache. The value is approximate, as cache extents might be merged or truncated. Since MQE does not expose per-step sample statistics, a heuristic is applied: each extent stores the total number of ProcessedSamples, and when only a sub-range of an extent is used in a query, the number of samples is estimated proportionally to the overlap between the requested range and the extent's time range.
…nSubrange more clear - it receives range borders, not he the range itself.
1. TestMergeCacheExtentsForRequest now expectsExtents, not samples and timeranges separately 2. Style fixes in the TestFilterRecentCacheExtents
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
1. Introduce new approximation algoritm. Instead of operating with SamplesProcessed count per whole extent, use approximate perStepStats, to avoid cumulative error when resizing extent 2. Move SamplesProcessedFromCache stat to query-frontend stats from the Querier stats, since they are tracked & reported in frontend.
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Sum samples processed from cache, rather that overwrite. Also, mergePerStepStats in one pass, under the assumption that they are alwats sorted by timestamp
split_and_cache_middleware doesn't update it concurrently.
71a4c29
to
2706b22
Compare
What this PR does
The metric tracks the number of samples processed to execute a query if no query results cache was enabled.
How does it work: (Updated)
The extent now has a SamplesProcessedPerStep field. splitAndCacheMiddleware tracks partial stats of each downstream response it made and when downstreamRequest is converted to an extent. perStepStats are approximated from totalSamplesProcessed for this request. When extents are used to respond to a query in the partitionCacheExtents, the corresponding perStepStats are extracted and merged.
The same happens when extents are merged in or truncated when filtering recent cache extents filterRecentCacheExtents.
This number of samples processed in Extent is approximate since MQE does not expose per-step sample statistics. When extracting only part of the Extent's range, perStepStats are estimated based on TotalSamplesProcessed and number of step in the request range.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.