Skip to content

Annotate throttled profiles #3956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 15, 2025
Merged

Annotate throttled profiles #3956

merged 23 commits into from
Apr 15, 2025

Conversation

aleks-p
Copy link
Contributor

@aleks-p aleks-p commented Feb 27, 2025

With #3879 and #3914, distributors reject profiles when an externally controlled ingest limit is reached.

This change annotates the few profiles that are let through in this state, so that users of the profiling data can be made aware that data is being rejected. The information is stored in an extra column in the parquet table, and is applied to v1 and v2 read and write paths.

A few things are still not finished, namely:

  • deduplication in the read path
  • increase test coverage
  • solidify naming and API contract

@@ -311,6 +325,8 @@ type InMemoryProfile struct {
DefaultSampleType int64

Samples Samples

Annotations []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could achieve the same thing with just using the Comments field above in L322.

I guess comment strings would be in the string table, unsure if that is a good or a bad thing.

I think if we would go for a separate type/field, I think we should make it Key,Value, because that feels more extensible to me than a []string,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go with this, maybe at some point we can stop exploiting build Ids for github integration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I had to switch focus.

In theory we could achieve the same thing with just using the Comments field above in L322.
I guess comment strings would be in the string table, unsure if that is a good or a bad thing.

I considered using Comments at first, but because the strings end up in the string table there is no efficient way to read them back.

I think if we would go for a separate type/field, I think we should make it Key,Value, because that feels more extensible to me than a []string,

I agree. In parquet the data is stored in a more structured way:

phlareparquet.NewGroupField(AnnotationsColumnName, parquet.List(
			phlareparquet.Group{
				phlareparquet.NewGroupField("Body", parquet.Encoded(parquet.String(), &parquet.DeltaByteArray)),
			})),

[]string is only used in the in-memory representation currently, but a key/value will likely make merging data somewhat easier. I will give that a try.

@aleks-p aleks-p force-pushed the feat/tag-sampled-ingest branch 2 times, most recently from 0c05f6c to 7655d6b Compare April 7, 2025 23:47
@aleks-p aleks-p force-pushed the feat/tag-sampled-ingest branch from da2ba8c to 20eb89b Compare April 11, 2025 19:59
@aleks-p aleks-p force-pushed the feat/tag-sampled-ingest branch from 20eb89b to 8093954 Compare April 14, 2025 16:51
@aleks-p aleks-p marked this pull request as ready for review April 14, 2025 16:53
@aleks-p aleks-p requested a review from a team as a code owner April 14, 2025 16:53
@aleks-p
Copy link
Contributor Author

aleks-p commented Apr 14, 2025

I plan to add more tests but the critical pieces should be in place now.

Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I couldn't think of an areas that you forgot about. I also haven't tried this in action, but so far so good. A couple of thoughts I had inline.

Comment on lines 27 to 28
// annotations provide generic metadata for profiles
repeated types.v1.ProfileAnnotation annotations = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we seem to except annotations from external sources, should we limit a certain prefix to internal use? That might help to trust the them slightly more

We could say everything starting with __, we wouldn't accept in the distributors.

Also potentially we might want to remove the annotations in distributors all together. Wdyt?

Copy link
Contributor Author

@aleks-p aleks-p Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great point.

I didn't intend for annotations to be exposed in the API but we have to do it this way because we reuse the proto messages between distributors and ingesters.

I like the idea with prefixing "trustworthy" annotations, as well as (at least temporarily) rejecting external annotations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this up to you, both of them are better than not checking at all.

I guess we could always loosen the criteria.

repeated ProfileAnnotation annotations = 3;
}

message ProfileAnnotation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is also worth documenting, how annotations are different from labels.

  • They can't be used at query time to select on.
  • There can be multiple annotations with the same key (with labels we don't allow duplicates).

@aleks-p aleks-p merged commit 3ff7247 into main Apr 15, 2025
20 checks passed
@aleks-p aleks-p deleted the feat/tag-sampled-ingest branch April 15, 2025 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants