-
Notifications
You must be signed in to change notification settings - Fork 639
Annotate throttled profiles #3956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pkg/phlaredb/schemas/v1/profiles.go
Outdated
@@ -311,6 +325,8 @@ type InMemoryProfile struct { | |||
DefaultSampleType int64 | |||
|
|||
Samples Samples | |||
|
|||
Annotations []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could achieve the same thing with just using the Comments field above in L322.
I guess comment strings would be in the string table, unsure if that is a good or a bad thing.
I think if we would go for a separate type/field, I think we should make it Key,Value, because that feels more extensible to me than a []string
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with this, maybe at some point we can stop exploiting build Ids for github integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay, I had to switch focus.
In theory we could achieve the same thing with just using the Comments field above in L322.
I guess comment strings would be in the string table, unsure if that is a good or a bad thing.
I considered using Comments
at first, but because the strings end up in the string table there is no efficient way to read them back.
I think if we would go for a separate type/field, I think we should make it Key,Value, because that feels more extensible to me than a []string,
I agree. In parquet the data is stored in a more structured way:
phlareparquet.NewGroupField(AnnotationsColumnName, parquet.List(
phlareparquet.Group{
phlareparquet.NewGroupField("Body", parquet.Encoded(parquet.String(), &parquet.DeltaByteArray)),
})),
[]string
is only used in the in-memory representation currently, but a key/value will likely make merging data somewhat easier. I will give that a try.
0c05f6c
to
7655d6b
Compare
da2ba8c
to
20eb89b
Compare
20eb89b
to
8093954
Compare
I plan to add more tests but the critical pieces should be in place now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! I couldn't think of an areas that you forgot about. I also haven't tried this in action, but so far so good. A couple of thoughts I had inline.
api/push/v1/push.proto
Outdated
// annotations provide generic metadata for profiles | ||
repeated types.v1.ProfileAnnotation annotations = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we seem to except annotations from external sources, should we limit a certain prefix to internal use? That might help to trust the them slightly more
We could say everything starting with __
, we wouldn't accept in the distributors.
Also potentially we might want to remove the annotations in distributors all together. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great point.
I didn't intend for annotations to be exposed in the API but we have to do it this way because we reuse the proto messages between distributors and ingesters.
I like the idea with prefixing "trustworthy" annotations, as well as (at least temporarily) rejecting external annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this up to you, both of them are better than not checking at all.
I guess we could always loosen the criteria.
repeated ProfileAnnotation annotations = 3; | ||
} | ||
|
||
message ProfileAnnotation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is also worth documenting, how annotations are different from labels.
- They can't be used at query time to select on.
- There can be multiple annotations with the same key (with labels we don't allow duplicates).
With #3879 and #3914, distributors reject profiles when an externally controlled ingest limit is reached.
This change annotates the few profiles that are let through in this state, so that users of the profiling data can be made aware that data is being rejected. The information is stored in an extra column in the parquet table, and is applied to v1 and v2 read and write paths.
A few things are still not finished, namely: