本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
查询 Prometheus 指标
现在,指标已被摄取到工作区,您可以对其进行查询。
要创建可视化表示指标的控制面板,可以使用 Amazon Managed Grafana 等服务。Amazon Managed Grafana(或 Grafana 的独立实例)可以构建一个图形界面,以多种显示演示样式显示您的指标。有关 Amazon Managed Grafana 的更多信息,请参阅《Amazon Managed Grafana 用户指南》。
您还可以创建一次性查询,探索您的数据,或使用直接查询编写自己的应用程序来使用您的指标。直接查询使用 Amazon Managed Service for Prometheus API 和标准 Prometheus 查询语言 PromQL 从 Prometheus 工作区获取数据。有关 PromQL 及其语法的更多信息,请参阅 Prometheus 文档中的 Querying Prometheus
主题
PromQL 备忘单
在适用于 Prometheus 的亚马逊托管服务工作区中查询指标时,使用此 PromQL(Prometheus 查询语言)备忘单作为快速参考。借助 PromQL,您可以通过其功能查询语言实时选择和聚合时间序列数据。
有关 PromQL 的更多详细信息,请参阅网站上的 PromQL 备忘单
基本选择器
按指标名称和标签匹配器选择时间序列:
# Select all time series with the metric name http_requests_total http_requests_total # Select time series with specific label values http_requests_total{job="prometheus", method="GET"} # Use label matchers http_requests_total{status_code!="200"} # Not equal http_requests_total{status_code=~"2.."} # Regex match http_requests_total{status_code!~"4.."} # Negative regex match
范围向量选择器
选择一段时间内的样本范围:
# Select 5 minutes of data http_requests_total[5m] # Time units: s (seconds), m (minutes), h (hours), d (days), w (weeks), y (years) cpu_usage[1h] memory_usage[30s]
聚合运算符
跨多个时间序列聚合数据:
# Sum all values sum(http_requests_total) # Sum by specific labels sum by (job) (http_requests_total) sum without (instance) (http_requests_total) # Other aggregation operators avg(cpu_usage) # Average min(response_time) # Minimum max(response_time) # Maximum count(up) # Count of series stddev(cpu_usage) # Standard deviation
常见的函数
应用函数来转换数据:
# Rate of increase per second (for counters) rate(http_requests_total[5m]) # Increase over time range increase(http_requests_total[1h]) # Derivative (for gauges) deriv(cpu_temperature[5m]) # Mathematical functions abs(cpu_usage - 50) # Absolute value round(cpu_usage, 0.1) # Round to nearest 0.1 sqrt(memory_usage) # Square root # Time functions time() # Current Unix timestamp hour() # Hour of day (0-23) day_of_week() # Day of week (0-6, Sunday=0)
二元运算符
执行算术和逻辑运算:
# Arithmetic operators cpu_usage + 10 memory_total - memory_available disk_usage / disk_total * 100 # Comparison operators (return 0 or 1) cpu_usage > 80 memory_usage < 1000 response_time >= 0.5 # Logical operators (cpu_usage > 80) and (memory_usage > 1000) (status_code == 200) or (status_code == 201)
实用查询示例
您可以在适用于 Prometheus 的亚马逊托管服务工作区中使用的常见监控查询:
# CPU usage percentage 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Memory usage percentage (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 # Request rate per second sum(rate(http_requests_total[5m])) by (job) # Error rate percentage sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 # 95th percentile response time histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) # Top 5 instances by CPU usage topk(5, avg by (instance) (cpu_usage))