使用配置文件配置告警资源
使用可进行版本控制的配置文件管理您的告警资源。Grafana 启动时,会配置您配置文件中定义的资源。配置可以创建、更新或删除 Grafana 实例中的现有资源。
本指南概述了使用 YAML 文件配置告警资源的步骤和参考信息。如需实际演示,您可以克隆并试用这个使用 Grafana OSS 和 Docker Compose 的示例。
注意
使用配置文件配置 Grafana 在 Grafana Cloud 中不可用。
您无法在 Grafana 中编辑通过文件配置的资源。您只能通过更改配置文件并重启 Grafana 或执行热重载来更改资源属性。这可以防止对资源进行的更改在重新配置文件或执行热重载时被覆盖。
使用配置文件进行的配置在 Grafana 系统初始设置期间进行,但您可以随时使用 Grafana Admin API 重新运行它。
导入现有的告警资源会导致冲突。首先,如果存在,请移除您计划导入的资源。
以下列出了如何设置文件以及每个对象所需的字段的详细信息,具体取决于您正在配置的资源。
导入告警规则
在您的 Grafana 实例中使用配置文件创建或删除告警规则。
在 Grafana 中找到告警规则组。
导出并下载您的告警规则配置文件。
将内容复制到 YAML 或 JSON 配置文件中,并将其添加到您要导入告警资源的 Grafana 实例的
provisioning/alerting
目录。示例配置文件如下所示。
重启您的 Grafana 实例(或使用 Admin API 重新加载配置文件)。
以下是创建告警规则的示例配置文件。
# config file version
apiVersion: 1
# List of rule groups to import or update
groups:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the rule group
name: my_rule_group
# <string, required> name of the folder the rule group will be stored in
folder: my_first_folder
# <duration, required> interval that the rule group should evaluated at
interval: 60s
# <list, required> list of rules that are part of the rule group
rules:
# <string, required> unique identifier for the rule. Should not exceed 40 symbols. Only letters, numbers, - (hyphen), and _ (underscore) allowed.
- uid: my_id_1
# <string, required> title of the rule that will be displayed in the UI
title: my_first_rule
# <string, required> which query should be used for the condition
condition: A
# <list, required> list of query objects that should be executed on each
# evaluation - should be obtained through the API
data:
- refId: A
datasourceUid: '__expr__'
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
type: last
type: query
datasource:
type: __expr__
uid: '__expr__'
expression: 1==0
intervalMs: 1000
maxDataPoints: 43200
refId: A
type: math
# <string> UID of a dashboard that the alert rule should be linked to
dashboardUid: my_dashboard
# <int> ID of the panel that the alert rule should be linked to
panelId: 123
# <string> the state the alert rule will have when no data is returned
# possible values: "NoData", "Alerting", "OK", default = NoData
noDataState: Alerting
# <string> the state the alert rule will have when the query execution
# failed - possible values: "Error", "Alerting", "OK"
# default = Alerting
execErrState: Alerting
# <duration, required> for how long should the alert fire before alerting
for: 60s
# <map<string, string>> a map of strings to pass around any data
annotations:
some_key: some_value
# <map<string, string> a map of strings that can be used to filter and
# route alerts
labels:
team: sre_team_1
以下是删除告警规则的示例配置文件。
# config file version
apiVersion: 1
# List of alert rule UIDs that should be deleted
deleteRules:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> unique identifier for the rule
uid: my_id_1
导入联系点
在您的 Grafana 实例中使用配置文件创建或删除联系点。
在 Grafana 中找到联系点。
导出并下载您的联系点配置文件。
将内容复制到 YAML 或 JSON 配置文件中,并将其添加到您要导入告警资源的 Grafana 实例的
provisioning/alerting
目录。示例配置文件如下所示。
重启您的 Grafana 实例(或使用 Admin API 重新加载配置文件)。
以下是创建联系点的示例配置文件。
# config file version
apiVersion: 1
# List of contact points to import or update
contactPoints:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the contact point
name: cp_1
receivers:
# <string, required> unique identifier for the receiver. Should not exceed 40 symbols. Only letters, numbers, - (hyphen), and _ (underscore) allowed.
- uid: first_uid
# <string, required> type of the receiver
type: prometheus-alertmanager
# <bool, optional> Disable the additional [Incident Resolved] follow-up alert, default = false
disableResolveMessage: false
# <object, required> settings for the specific receiver type
settings:
url: http://test:9000
以下是删除联系点的示例配置文件。
# config file version
apiVersion: 1
# List of receivers that should be deleted
deleteContactPoints:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> unique identifier for the receiver
uid: first_uid
设置
以下是一些您可以用于不同联系点集成的设置示例。
Alertmanager
type: prometheus-alertmanager
settings:
# <string, required>
url: https://:9093
# <string>
basicAuthUser: abc
# <string>
basicAuthPassword: abc123
钉钉
type: dingding
settings:
# <string, required>
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxx
# <string> options: link, actionCard
msgType: link
# <string>
message: |
{{ template "default.message" . }}
Discord
type: discord
settings:
# <string, required>
url: https://discord/webhook
# <string>
avatar_url: https://my_avatar
# <bool>
use_discord_username: false
# <string>
message: |
{{ template "default.message" . }}
电子邮件
type: email
settings:
# <string, required>
addresses: me@example.com;you@example.com
# <bool>
singleEmail: false
# <string>
message: my optional message to include
# <string>
subject: |
{{ template "default.title" . }}
Google Chat
type: googlechat
settings:
# <string, required>
url: https://google/webhook
# <string>
message: |
{{ template "default.message" . }}
Kafka
type: kafka
settings:
# <string, required>
kafkaRestProxy: https://:8082
# <string, required>
kafkaTopic: topic1
LINE
type: line
settings:
# <string, required>
token: xxx
MQTT
type: mqtt
settings:
# <string, required>
brokerUrl: tcp://127.0.0.1:1883
# <string>
clientId: grafana
# <string, required>
topic: grafana/alerts
# <string>
messageFormat: json
# <string>
username: grafana
# <string>
password: password1
# <string>
qos: 0
# <bool>
retain: false
# <map>
tlsConfig:
# <bool>
insecureSkipVerify: false
# <string>
clientCertificate: certificate in PEM format
# <string>
clientKey: key in PEM format
# <string>
caCertificate: CA certificate in PEM format
Microsoft Teams
type: teams
settings:
# <string, required>
url: https://ms_teams_url
# <string>
title: |
{{ template "default.title" . }}
# <string>
sectiontitle: ''
# <string>
message: |
{{ template "default.message" . }}
OpsGenie
type: opsgenie
settings:
# <string, required>
apiKey: xxx
# <string, required>
apiUrl: https://api.opsgenie.com/v2/alerts
# <string>
message: |
{{ template "default.title" . }}
# <string>
description: some descriptive description
# <bool>
autoClose: false
# <bool>
overridePriority: false
# <string> options: tags, details, both
sendTagsAs: both
PagerDuty
type: pagerduty
settings:
# <string, required> the 32-character Events API key https://support.pagerduty.com/docs/api-access-keys#events-api-keys
integrationKey: XXX
# <string> options: critical, error, warning, info
severity: critical
# <string>
class: ping failure
# <string>
component: Grafana
# <string>
group: app-stack
# <string>
summary: |
{{ template "default.message" . }}
Pushover
type: pushover
settings:
# <string, required>
apiToken: XXX
# <string, required>
userKey: user1,user2
# <string>
device: device1,device2
# <string> options (high to low): 2,1,0,-1,-2
priority: '2'
# <string>
retry: '30'
# <string>
expire: '120'
# <string> the number of seconds before a message expires and is deleted automatically. Examples: 10s, 5m30s, 8h.
ttl:
# <string>
sound: siren
# <string>
okSound: magic
# <string>
message: |
{{ template "default.message" . }}
Slack
type: slack
settings:
# <string, required>
recipient: alerting-dev
# <string, required>
token: xxx
# <string>
username: grafana_bot
# <string>
icon_emoji: heart
# <string>
icon_url: https://icon_url
# <string>
mentionUsers: user_1,user_2
# <string>
mentionGroups: group_1,group_2
# <string> options: here, channel
mentionChannel: here
# <string> Optionally provide a Slack incoming webhook URL for sending messages, in this case the token isn't necessary
url: https://some_webhook_url
# <string>
endpointUrl: https://custom_url/api/chat.postMessage
# <string>
color: {{ if eq .Status "firing" }}#D63232{{ else }}#36a64f{{ end }}
# <string>
title: |
{{ template "slack.default.title" . }}
text: |
{{ template "slack.default.text" . }}
Sensu Go
type: sensugo
settings:
# <string, required>
url: http://sensu-api.local:8080
# <string, required>
apikey: xxx
# <string>
entity: default
# <string>
check: default
# <string>
handler: some_handler
# <string>
namespace: default
# <string>
message: |
{{ template "default.message" . }}
Telegram
type: telegram
settings:
# <string, required>
bottoken: xxx
# <string, required>
chatid: some_chat_id
# <string>
message: |
{{ template "default.message" . }}
Threema Gateway
type: threema
settings:
# <string, required>
api_secret: xxx
# <string, required>
gateway_id: A5K94S9
# <string, required>
recipient_id: A9R4KL4S
VictorOps
type: victorops
settings:
# <string, required>
url: XXX
# <string> options: CRITICAL, WARNING
messageType: CRITICAL
Webhook
type: webhook
settings:
# <string, required>
url: https://endpoint_url
# <string> options: POST, PUT
httpMethod: POST
# <string>
username: abc
# <string>
password: abc123
# <string>
authorization_scheme: Bearer
# <string>
authorization_credentials: abc123
# <string>
maxAlerts: '10'
# <map>
tlsConfig:
# <bool>
insecureSkipVerify: false
# <string>
clientCertificate: certificate in PEM format
# <string>
clientKey: key in PEM format
# <string>
caCertificate: CA certificate in PEM format
hmacConfig:
#<string>
secret: secret-key
#<string>
header: X-Grafana-Alerting-Signature
#<string>
timestampHeader: X-Grafana-Alerting-Signature-Timestamp
企业微信
type: wecom
settings:
# <string, required>
url: https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxx
# <string>
message: |
{{ template "default.message" . }}
# <string>
title: |
{{ template "default.title" . }}
导入通知模板组
在您的 Grafana 实例中使用配置文件创建或删除通知模板组。
在 Grafana 中找到通知模板组。
通过复制模板内容和名称导出模板组。
将内容复制到 YAML 或 JSON 配置文件中,并将其添加到您要导入告警资源的 Grafana 实例的
provisioning/alerting
目录。示例配置文件如下所示。
重启您的 Grafana 实例(或使用 Admin API 重新加载配置文件)。
以下是创建通知模板组的示例配置文件。
# config file version
apiVersion: 1
# List of templates to import or update
templates:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the template group, must be unique
name: my_first_template
# <string, required> content of the template group
template: |
{{ define "my_first_template" }}
Custom notification message
{{ end }}
以下是删除通知模板组的示例配置文件。
# config file version
apiVersion: 1
# List of alert rule UIDs that should be deleted
deleteTemplates:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the template group, must be unique
name: my_first_template
导入通知策略
在您的 Grafana 实例中使用配置文件创建或重置通知策略树。
在 Grafana 中,整个通知策略树被视为一个单一的大型资源。新的具体策略应作为根策略下的子策略添加。由于具体策略可能相互依赖,您不能配置策略树的子集;整个策略树必须在一个地方定义。
警告
由于策略树是单一资源,配置它将覆盖通知策略树中的所有策略。但是,这不会影响告警规则直接选择联系点时创建的内部策略。
在 Grafana 中找到通知策略树。
导出并下载您的通知策略树配置文件。
将内容复制到 YAML 或 JSON 配置文件中,并将其添加到您要导入告警资源的 Grafana 实例的
provisioning/alerting
目录。示例配置文件如下所示。
重启您的 Grafana 实例(或使用 Admin API 重新加载配置文件)。
以下是创建通知策略的示例配置文件。
# config file version
apiVersion: 1
# List of notification policies
policies:
# <int> organization ID, default = 1
- orgId: 1
# <string> name of the contact point that should be used for this route
receiver: grafana-default-email
# <list> The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
#
# To aggregate by all possible labels use the special value '...' as
# the sole label name, for example:
# group_by: ['...']
# This effectively disables aggregation entirely, passing through all
# alerts as-is. This is unlikely to be what you want, unless you have
# a very low alert volume or your upstream notification system performs
# its own grouping.
group_by: ['...']
# <list> a list of prometheus-like matchers that an alert rule has to fulfill to match the node (allowed chars
# [a-zA-Z_:])
matchers:
- alertname = Watchdog
- service_id_X = serviceX
- severity =~ "warning|critical"
# <list> a list of grafana-like matchers that an alert rule has to fulfill to match the node
object_matchers:
- ['alertname', '=', 'CPUUsage']
- ['service_id-X', '=', 'serviceX']
- ['severity', '=~', 'warning|critical']
# <list> Times when the route should be muted. These must match the name of a
# mute time interval.
# Additionally, the root node cannot have any mute times.
# When a route is muted it will not send any notifications, but
# otherwise acts normally (including ending the route-matching process
# if the `continue` option is not set)
mute_time_intervals:
- abc
# <duration> How long to initially wait to send a notification for a group
# of alerts. Allows to collect more initial alerts for the same group.
# (Usually ~0s to few minutes), default = 30s
group_wait: 30s
# <duration> How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more), default = 5m
group_interval: 5m
# <duration> How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more), default = 4h
repeat_interval: 4h
# <list> Zero or more child policies. The schema is the same as the root policy.
# routes:
# # Another recursively nested policy...
# - receiver: another-receiver
# matchers:
# - ...
# ...
以下是将策略树重置回其默认值的示例配置文件。
# config file version
apiVersion: 1
# List of orgIds that should be reset to the default policy
resetPolicies:
- 1
导入静默时间
在您的 Grafana 实例中使用配置文件创建或删除静默时间。
在 Grafana 中找到静默时间。
导出并下载您的静默时间配置文件。
将内容复制到 YAML 或 JSON 配置文件中,并将其添加到您要导入告警资源的 Grafana 实例的
provisioning/alerting
目录。示例配置文件如下所示。
重启您的 Grafana 实例(或使用 Admin API 重新加载配置文件)。
以下是创建静默时间的示例配置文件。
# config file version
apiVersion: 1
# List of mute time intervals to import or update
muteTimes:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the mute time interval, must be unique
name: mti_1
# <list> time intervals that should trigger the muting
# refer to https://prometheus.ac.cn/docs/alerting/latest/configuration/#time_interval-0
time_intervals:
- times:
- start_time: '06:00'
end_time: '23:59'
location: 'UTC'
weekdays: ['monday:wednesday', 'saturday', 'sunday']
months: ['1:3', 'may:august', 'december']
years: ['2020:2022', '2030']
days_of_month: ['1:5', '-3:-1']
以下是删除静默时间的示例配置文件。
# config file version
apiVersion: 1
# List of mute time intervals that should be deleted
deleteMuteTimes:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the mute time interval, must be unique
name: mti_1
模板变量插值
配置使用 $variable
语法对环境变量进行插值。
contactPoints:
- orgId: 1
name: My Contact Email Point
receivers:
- uid: 1
type: email
settings:
addresses: $EMAIL
在此示例中,配置会将 $EMAIL
替换为 EMAIL
环境变量的值,如果不存在则替换为空字符串。更多信息,请参阅配置文档中的使用环境变量。
在告警资源中,大多数属性支持模板变量插值,但有少数例外情况:
- 告警规则注释:
groups[].rules[].annotations
- 告警规则时间范围:
groups[].rules[].relativeTimeRange
- 告警规则查询模型:
groups[].rules[].data.model
- 静默时间名称:
muteTimes[].name
- 静默时间间隔:
muteTimes[].time_intervals[]
- 通知模板组名称:
templates[].name
- 通知模板组内容:
templates[].template
注意:对于支持插值的属性,您可能会在不需要时意外替换模板变量。为避免这种情况,您可以使用 $$variable
来转义 $variable
。
例如,在配置 contactPoints.receivers.settings
对象中的 subject
属性时,该属性旨在使用 $labels
变量。
subject: '{{ $labels }}'
将进行插值,错误地将 subject 定义为subject: '{{ }}'
。subject: '{{ $$labels }}'
将不进行插值,正确地将 subject 定义为subject: '{{ $labels }}'
。
更多示例
有关本指南概念的更多示例
- 通过使用 Docker Compose 或 Kubernetes 部署的演示项目,尝试使用 YAML 文件在 Grafana OSS 中配置告警资源。
- 查看 Grafana 配置文档中关于 Grafana 如何配置资源的不同选项。
- 对于 Helm 支持,请查看 Grafana Helm Chart 文档中配置告警资源的示例。