Skip to content

top-level labels aren't applied to alerts #786

@JamesGuthrie

Description

@JamesGuthrie
	// Labels are the Prometheus labels that will have all the recording
	// and alerting rules generated for the service SLOs.
	Labels map[string]string `json:"labels,omitempty"`

https://github.com/slok/sloth/blob/v0.15.0/pkg/prometheus/api/v1/v1.go#L69-L71

Based on the above description, I assume that labels defined at the top level cascade down to both recording rules and alerts. Based on my tests, they only cascade down to recording rules, but not alerts.

This is also evident in the examples here: https://sloth.dev/examples/default/getting-started/#__tabbed_1_2.

The source file defines the following top-level labels:

labels:
  owner: "myteam"
  repo: "myorg/myservice"
  tier: "2"

And they are reflected in a recording rule:

  - record: slo:sli_error:ratio_rate5m
    expr: |
      (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[5m])))
      /
      (sum(rate(http_request_duration_seconds_count{job="myservice"}[5m])))
    labels:
      cmd: examplesgen.sh
      owner: myteam
      repo: myorg/myservice
      sloth_id: myservice-requests-availability
      sloth_service: myservice
      sloth_slo: requests-availability
      sloth_window: 5m
      tier: "2"

But they are not reflected in the alert:

  - alert: MyServiceHighErrorRate
    expr: |
      (
          max(slo:sli_error:ratio_rate5m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (14.4 * 0.0009999999999999432)) without (sloth_window)
          and
          max(slo:sli_error:ratio_rate1h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (14.4 * 0.0009999999999999432)) without (sloth_window)
      )
      or
      (
          max(slo:sli_error:ratio_rate30m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (6 * 0.0009999999999999432)) without (sloth_window)
          and
          max(slo:sli_error:ratio_rate6h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (6 * 0.0009999999999999432)) without (sloth_window)
      )
    labels:
      category: availability
      routing_key: myteam
      severity: pageteam
      sloth_severity: page
    annotations:
      summary: High error rate on 'myservice' requests responses
      title: (page) {{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget
        burn rate is too fast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions