跳转至

2.6 配置告警

配置告警规则

使用 PromQL 定义告警规则:

groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
        for: 10m
        labels:
          severity: page
        annotations:
          summary: High request latency
          description: description info

指定告警规则文件的访问路径:

rule_files:
  - /etc/prometheus/rules/*.rules
groups:
- name: hostStatsAlert
  rules:
  - alert: hostCpuUsageAlert
    expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} CPU usgae high"
      description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
  - alert: hostMemUsageAlert
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

通过 http://127.0.0.1:9090/rules 查看规则文件

对于 pending 或者 firing 的告警,可以在时间序列

ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}

中查找

AlertManager

$ wget https://github.com/prometheus/alertmanager/releases/download/v0.16.2/alertmanager-0.16.2.linux-amd64.tar.gz
$ tar zxvf alertmanager-0.16.2.linux-amd64.tar.gz
$ ln -sf alertmanager-0.16.2.linux-amd64 alertmanager
$ cd alertmanager

配置文件 alertmanager.yml

启动 alertmanager:

$ ./alertmanager --config.file=./alertmanager.yml --storage.path=/data/alertmanager

查看:

$ curl http://localhost:9093

在 prometheus 的配置中增加:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093