查询 Prometheus 指标 - Amazon Managed Service for Prometheus

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

查询 Prometheus 指标

现在,指标已被摄取到工作区,您可以对其进行查询。

要创建可视化表示指标的控制面板,可以使用 Amazon Managed Grafana 等服务。Amazon Managed Grafana(或 Grafana 的独立实例)可以构建一个图形界面,以多种显示演示样式显示您的指标。有关 Amazon Managed Grafana 的更多信息,请参阅《Amazon Managed Grafana 用户指南》

您还可以创建一次性查询,探索您的数据,或使用直接查询编写自己的应用程序来使用您的指标。直接查询使用 Amazon Managed Service for Prometheus API 和标准 Prometheus 查询语言 PromQL 从 Prometheus 工作区获取数据。有关 PromQL 及其语法的更多信息,请参阅 Prometheus 文档中的 Querying Prometheus

PromQL 备忘单

在适用于 Prometheus 的亚马逊托管服务工作区中查询指标时,使用此 PromQL(Prometheus 查询语言)备忘单作为快速参考。借助 PromQL,您可以通过其功能查询语言实时选择和聚合时间序列数据。

有关 PromQL 的更多详细信息,请参阅网站上的 PromQL 备忘单PromLabs

基本选择器

按指标名称和标签匹配器选择时间序列:

# Select all time series with the metric name http_requests_total http_requests_total # Select time series with specific label values http_requests_total{job="prometheus", method="GET"} # Use label matchers http_requests_total{status_code!="200"} # Not equal http_requests_total{status_code=~"2.."} # Regex match http_requests_total{status_code!~"4.."} # Negative regex match

范围向量选择器

选择一段时间内的样本范围:

# Select 5 minutes of data http_requests_total[5m] # Time units: s (seconds), m (minutes), h (hours), d (days), w (weeks), y (years) cpu_usage[1h] memory_usage[30s]

聚合运算符

跨多个时间序列聚合数据:

# Sum all values sum(http_requests_total) # Sum by specific labels sum by (job) (http_requests_total) sum without (instance) (http_requests_total) # Other aggregation operators avg(cpu_usage) # Average min(response_time) # Minimum max(response_time) # Maximum count(up) # Count of series stddev(cpu_usage) # Standard deviation

常见的函数

应用函数来转换数据:

# Rate of increase per second (for counters) rate(http_requests_total[5m]) # Increase over time range increase(http_requests_total[1h]) # Derivative (for gauges) deriv(cpu_temperature[5m]) # Mathematical functions abs(cpu_usage - 50) # Absolute value round(cpu_usage, 0.1) # Round to nearest 0.1 sqrt(memory_usage) # Square root # Time functions time() # Current Unix timestamp hour() # Hour of day (0-23) day_of_week() # Day of week (0-6, Sunday=0)

二元运算符

执行算术和逻辑运算:

# Arithmetic operators cpu_usage + 10 memory_total - memory_available disk_usage / disk_total * 100 # Comparison operators (return 0 or 1) cpu_usage > 80 memory_usage < 1000 response_time >= 0.5 # Logical operators (cpu_usage > 80) and (memory_usage > 1000) (status_code == 200) or (status_code == 201)

实用查询示例

您可以在适用于 Prometheus 的亚马逊托管服务工作区中使用的常见监控查询:

# CPU usage percentage 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Memory usage percentage (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 # Request rate per second sum(rate(http_requests_total[5m])) by (job) # Error rate percentage sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 # 95th percentile response time histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) # Top 5 instances by CPU usage topk(5, avg by (instance) (cpu_usage))