Skip to content

feat: add support for per check alerts #2128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VikaCep
Copy link

@VikaCep VikaCep commented Apr 11, 2025

Closes https://github.com/grafana/synthetic-monitoring/issues/228

Adds support for Synthetic Monitoring - Per Check Alerts (docs)

Requires grafana/synthetic-monitoring-api-go-client#272

Example:

  • defining the resources
resource "grafana_synthetic_monitoring_check" "test" {
  job     = "Test Check with Alerts"
  target  = "https://grafana.com"
  enabled = true
  probes  = [1]
  labels  = {}
  settings {
    http {
      ip_version = "V4"
      method     = "GET"
    }
  }
}

resource "grafana_synthetic_monitoring_check_alerts" "test" {
  check_id = grafana_synthetic_monitoring_check.test.id
  alerts = [{
    name      = "ProbeFailedExecutionsTooHigh"
    threshold = 1
    period    = "15m"
  },
  {
    name      = "TLSTargetCertificateCloseToExpiring"
    threshold = 14
    period    = ""
  }]
} 
  • running terraform apply
image image
  • seeing the generated resources in the app

image
image

@VikaCep VikaCep self-assigned this Apr 11, 2025
Copy link

In order to lower resource usage and have a faster runtime, PRs will not run Cloud tests automatically.
To do so, a Grafana Labs employee must trigger the cloud acceptance tests workflow manually.

@VikaCep VikaCep force-pushed the sm-alerts-per-check-support branch from 2fa6b13 to ec61fd6 Compare April 14, 2025 22:03
Copy link

@ka3de ka3de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on Terraform or providers implementation, so someone else can probably provide better feedback on that.

Reviewing this mostly from the Synthetic Monitoring point of view I think overall looks OK, but I made a few comments, particularly on validation and on whether we should make this more generic and avoid referencing particular alert types so we don't have to make changes to the provider implementation or the docs whenever we add new alerts. Let me know what do you think.

Manages alerts for a check in Grafana Synthetic Monitoring.

* [Official documentation](https://grafana.com/docs/grafana-cloud/testing/synthetic-monitoring/configure-alerts/configure-per-check-alerts/)
* [API documentation](https://github.com/grafana/synthetic-monitoring-api-go-client/blob/main/docs/API.md#alerts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a comment on this in the PR for the API client that this docs currently don't exist.

* `alerts` - (Required) List of alerts for the check. Each alert has the following fields:
* `name` - (Required) Name of the alert. Must be one of: `ProbeFailedExecutionsTooHigh`, `TLSTargetCertificateCloseToExpiring`.
* `threshold` - (Required) Threshold value for the alert.
* `period` - (Required) Period for the alert threshold. Required and must be one of: `1m`, `2m`, `5m`, `10m`, `15m`, `20m`, `30m`, `1h` for `ProbeFailedExecutionsTooHigh` alerts. For other alerts, set to `""` (empty string).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the period is more related with the alert type, more than "with the threshold". Also, the min period is by now 5m. So should this be:

Suggested change
* `period` - (Required) Period for the alert threshold. Required and must be one of: `1m`, `2m`, `5m`, `10m`, `15m`, `20m`, `30m`, `1h` for `ProbeFailedExecutionsTooHigh` alerts. For other alerts, set to `""` (empty string).
* `period` - (Required) Period for the alert. Required and must be one of: `5m`, `10m`, `15m`, `20m`, `30m`, `1h` for `ProbeFailedExecutionsTooHigh` alerts. For other alerts, set to `""` (empty string).

?

I'm also wondering if we should mention specific alert names or considerations for these, as that means that we'll have to update these docs everytime we add a new alert, or we can just rely on the link to the Grafana Cloud docs and expect to provide the full documentation there?

Description: `
Manages alerts for a check in Grafana Synthetic Monitoring.

* [Official documentation](https://grafana.com/docs/grafana-cloud/synthetic-monitoring/configure-alerts/)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this link is incorrect:

Suggested change
* [Official documentation](https://grafana.com/docs/grafana-cloud/synthetic-monitoring/configure-alerts/)
* [Official documentation](https://grafana.com/docs/grafana-cloud/testing/synthetic-monitoring/configure-alerts/configure-per-check-alerts/)

Comment on lines +95 to +98
// Validate that the check exists
if err := validateCheckExists(ctx, c, checkID); err != nil {
return diag.FromErr(err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to do this from the client considering that the backend will already perform this validation as part of the "update alerts" request?

Comment on lines +204 to +206
if name == AlertNameProbeFailedExecutionsTooHigh && !hasPeriod {
return nil, fmt.Errorf("period is required when name is %s", AlertNameProbeFailedExecutionsTooHigh)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, but this validation is already done in the backend, so I wonder if we should just pass the fields set by the user and delegate the validation. Otherwise with this approach we'll have to modify the Terraform provider every time we add new alert types. What's your take on this?

return diag.FromErr(err)
}

d.SetId(fmt.Sprintf("%d", checkID))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d.SetId(fmt.Sprintf("%d", checkID))
d.SetId(strconv.FormatInt(checkID, 10))

Beyond that suggestion, I'm a little bit confused on this, probably because I don't know much about the Terraform provider implementation or how it actually works. In this function we receive a ResourceData that I understand matches the definition written by the user in the .tf file. So we expect that definition to contain a check_id as defined in the docs. Is this correct?

But then here we set the ID to the same check_id. I guess we have to do that due to how Terraform works? Reading the doc for the SetId method:

// SetId sets the ID of the resource. If the value is blank, then the
// resource is destroyed.

Comment on lines +120 to +123
// Validate that the check exists
if err := validateCheckExists(ctx, c, checkID); err != nil {
return diag.FromErr(err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Since this is already done by the API, should rely on that and remove this from the client?

Comment on lines +156 to +159
// Validate that the check exists
if err := validateCheckExists(ctx, c, checkID); err != nil {
return diag.FromErr(err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above regarding duplicated validation in client vs API backend.

Comment on lines +180 to +183
// Validate that the check exists
if err := validateCheckExists(ctx, c, checkID); err != nil {
return diag.FromErr(err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above regarding duplicated validation in client vs API backend.

Comment on lines +223 to +226
func validateCheckExists(ctx context.Context, c *smapi.Client, checkID int64) error {
_, err := c.GetCheck(ctx, checkID)
return err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related with the other comments, I'm not sure we need this explicit method in the client for alerts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants