Prometheus配置文件prometheus.yml 四个模块详解

要想知道一个工具是如何运行的，那么了解其配置参数，对了解整个工具以及后续的瓶颈了解、优化是非常有帮助的。

一，配置文件格式
官方文档说明: https://prometheus.io/docs/prometheus/latest/configuratio n/configuration/
The file is written in YAML format # 配置文件格式是yaml格式

说明：
.yml或者.yaml 都是 yaml格式的文件，
yaml格式的好处: 和json交互比较容易
python/go/java/php 有yaml格式库，方便语言之间解析,并且这种格式存储的信息量很大

二，配置文件指标说明
global: 全局配置（如果有内部单独设定，会覆盖这个参数）
alerting: 告警插件定义。这里会设定alertmanager这个报警插件。
rule_files: 告警规则。按照设定参数进行扫描加载，用于自定义报警规则，其报警媒介和route路由由alertmanager插件实现。
scrape_configs:采集配置。配置数据源，包含分组job_name以及具体target。又分为静态配置和服务发现

原始配置文件内容：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

1，global指标说明：

# my global config
global:

scrape_interval: 15s # 默认15s 全局每次数据收集的间隔
evaluation_interval: 15s # 规则扫描时间间隔是15秒，默认不填写是 1分钟
scrape_timeout: 5s    #超时时间
external_labels: # 用于外部系统标签的，不是用于metrics(度量)数据

常用的命令行参数(prometheus插件基本都是二进制的，可以通过二进制文件 -h了解参数用于自定义启动参数，否则默认)

./prometheus -h
--config.file="/opt/config/prometheus.yml"  # 读取指定配置文件
--web.listen-address="0.0.0.0:9090"  # 指定prometheus运行端口 
--log.level=info # 日志级别
--alertmanager.timeout=10s # 与报警组件的超时时间

2，alerting说明

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

这里定义和prometheus集成的alertmanager插件，用于监控报警。后续会单独进行alertmanger插件的配置、配置说明、报警媒介以及route路由规则记录。

3，rule_files说明
这个主要是用来设置告警规则，基于设定什么指标进行报警（类似触发器trigger）。这里设定好规则以后，prometheus会根据全局global设定的evaluation_interval参数进行扫描加载，规则改动后会自动加载。其报警媒介和route路由由alertmanager插件实现。

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

4，scrape_configs配置采集目标endpoints

scrape_configs 默认规则：
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

支持的配置:
job_name: 任务目标名，可以理解成分组，每个分组包含具体的target组员。
scrape_interval: 5s #这里如果单独设定的话，会覆盖global设定的参数，拉取时间间隔为5s
metrics_path # 监控项访问的url路径,https://prometheus.21yunwei.com/metrics【通过前端web做了反向代理到后端】
targets: Endpoint # 监控目标访问地址
说明：上述为静态规则，没有设置自动发现。这种情况下增加主机需要自行修改规则，通过supervisor reload 对应任务，也是缺点：每次静态规则添加都要重启prometheus服务，不利于运维自动化。

prometheus支持服务发现（也是运维最佳实践经常采用的）：
文件服务发现
基于文件的服务发现方式不需要依赖其他平台与第三方服务，用户只需将要新的target信息以yaml或json文件格式添加到target文件中，prometheus会定期从指定文件中读取target信息并更新
好处：
（1）不需要一个一个的手工去添加到主配置文件，只需要提交到要加载目录里边的json或yaml文件就可以了；
（2）方便维护，且不需要每次都重启prometheus服务端。

案例：

scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'cn-hz-21yunwei-devops'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    #静态规则
    static_configs:
    - targets: ['localhost:9090']
    
    #通过配置file 获取target，这里以21yunwei项目进行举例
  - job_name: 'cn-hz-21yunwei-other'
    file_sd_configs:
    - files:
      - file_config/21yunwei/host.json

json文件内容：

[root@cn-hz-21yunwei-devops 21yunwei]# cat host.json  |jq .
[
  {
    "targets": [
      "1.1.1.1:9010"
    ],
    "labels": {
      "group": "21yunwei",
      "app": "web",
      "hostname": "cn-hz-21yunwei-web"
    }
  },
  {
    "targets": [
      "2.2.2.2:9010"
    ],
    "labels": {
      "group": "21yunwei",
      "app": "devops",
      "hostname": "cn-hz-21yunwei-devops"
    }
  }
]

supervisor reload prometheus 后查看效果：