-
Notifications
You must be signed in to change notification settings - Fork 65
reduce log level in metrics logger not to trash the log #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirrozenbaum The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cc @kfswain |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
+1 on changing the log level to logger.V(logutil.DEFAULT).Info("pool is not initialized, skipping flushing metrics") In reviewing |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@danehans updated the log level and capitalized log message. |
return | ||
} | ||
|
||
var kvCacheTotal float64 | ||
var queueTotal int | ||
|
||
podMetrics := datastore.PodGetAll() | ||
logger.V(logutil.VERBOSE).Info("Flushing Prometheus Metrics", "ReadyPods", len(podMetrics)) | ||
logger.V(logutil.TRACE).Info("Refreshing Prometheus Metrics", "ReadyPods", len(podMetrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative log in verbose or default level that logs the number of ready pods when they change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any other place logging the ready pods (it's reported as a metric though).
I completely agree this is information we would like to know.
having said that, I don't think it belongs here.
the refreshMetrics function runs every few seconds to update metrics, which makes perfect sense since we would like to update metrics very frequently.
BUT, printing to the EPP log the number of ready pods every 5 seconds is spamming the log and making it very hard to debug (see example above).
IMO the number of pods in the pool shouldn't be printed as part of metrics recalculation function, it should be part of the Inf Pool status (kinda similar to ready pods status in deployment).
we could easily add this field to Inf Pool status + additional printer columns to show it when running kubectl get InferencePool
. it's pretty easy to implement in the controllers reconcile event which also updates only when something changes (as opposed to this log printing, which happens every 5 seconds even if nothing has changed).
If this sounds reasonable I'm happy to implement it this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proposed this change in #714
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
metrics logger is trashing the epp log. so instead of allowing a focused view of what changed, the epp log looks like this:
since EPP default log level is 4, this PR reduces the log level of the metrics logger (only this specific message) to TRACE to reduce this noise in the log.
additionally, as a followup to previous PR, change in few more places to use NamespacedName instead of passing around name + namespace as separate args.