2.6 配置告警
配置告警规则
使用 PromQL 定义告警规则:
groups:
- name: example
rules:
- alert: HighErrorRate
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
description: description info
指定告警规则文件的访问路径:
rule_files:
- /etc/prometheus/rules/*.rules
groups:
- name: hostStatsAlert
rules:
- alert: hostCpuUsageAlert
expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
- alert: hostMemUsageAlert
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
通过 http://127.0.0.1:9090/rules 查看规则文件
对于 pending 或者 firing 的告警,可以在时间序列
ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}
中查找
AlertManager
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.16.2/alertmanager-0.16.2.linux-amd64.tar.gz
$ tar zxvf alertmanager-0.16.2.linux-amd64.tar.gz
$ ln -sf alertmanager-0.16.2.linux-amd64 alertmanager
$ cd alertmanager
配置文件 alertmanager.yml
启动 alertmanager:
$ ./alertmanager --config.file=./alertmanager.yml --storage.path=/data/alertmanager
查看:
$ curl http://localhost:9093
在 prometheus 的配置中增加:
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093