有关与适用于 LiveAnalytics 的 Amazon Timestream 类似的功能,可以考虑使用适用于 InfluxDB 的 Amazon Timestream。适用于 InfluxDB 的 Amazon Timestream 提供简化的数据摄取和个位数毫秒级的查询响应时间,以实现实时分析。点击此处了解更多信息。
将聚合控制面板转换为计划查询
假设您正在计算整个实例集的统计信息,例如按五个微服务和部署服务的六个区域计算实例集中的主机数量。从下图快照中可见,共有 50 万台服务器正在输出指标数据,其中部分大型区域(如 us-east-1)的服务器数量超过 20 万台。
计算这些聚合,即计算数百 GB 数据的不同实例名称,除数据扫描成本外,还可能导致数十秒的查询延迟。
原始控制面板查询
控制面板中显示的聚合数据通过使用以下查询从原始数据计算得出。该查询使用多个 SQL 构造,例如不同的计数和多个聚合函数。
SELECT CASE WHEN microservice_name = 'apollo' THEN num_instances ELSE NULL END AS apollo, CASE WHEN microservice_name = 'athena' THEN num_instances ELSE NULL END AS athena, CASE WHEN microservice_name = 'demeter' THEN num_instances ELSE NULL END AS demeter, CASE WHEN microservice_name = 'hercules' THEN num_instances ELSE NULL END AS hercules, CASE WHEN microservice_name = 'zeus' THEN num_instances ELSE NULL END AS zeus FROM ( SELECT microservice_name, SUM(num_instances) AS num_instances FROM ( SELECT microservice_name, COUNT(DISTINCT instance_name) as num_instances FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636526171043) AND from_milliseconds(1636612571043) AND measure_name = 'metrics' GROUP BY region, cell, silo, availability_zone, microservice_name ) GROUP BY microservice_name )
转换为计划查询
前面的查询可以按以下方式转换为计划查询。首先在区域、单元格、筒仓、可用区和微服务内,计算给定部署中的唯一主机名。然后,将主机数相加,计算出每小时每个微服务的主机数。通过使用计划查询支持的 @scheduled_runtime 参数,您可以在调用查询时重新计算过去一小时的数据。内部查询 WHERE 子句中的 bin(@scheduled_runtime, 1h) 确保即使将查询安排在整点中间时段执行,仍能获取完整小时的数据。
尽管该查询计算的是每小时聚合数据,但正如您将在计划计算配置中看到的那样,该查询设置为每半小时刷新一次,以便您能在派生表中更快地获取更新。您可以根据数据刷新要求进行调整,例如每 15 分钟重新计算一次聚合,或在整点时重新计算。
SELECT microservice_name, hour, SUM(num_instances) AS num_instances FROM ( SELECT microservice_name, bin(time, 1h) AS hour, COUNT(DISTINCT instance_name) as num_instances FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND @scheduled_runtime AND measure_name = 'metrics' GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h) ) GROUP BY microservice_name, hour
{ "Name": "MultiPT30mHostCountMicroservicePerHr", "QueryString": "SELECT microservice_name, hour, SUM(num_instances) AS num_instances FROM ( SELECT microservice_name, bin(time, 1h) AS hour, COUNT(DISTINCT instance_name) as num_instances FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND @scheduled_runtime AND measure_name = 'metrics' GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h) ) GROUP BY microservice_name, hour", "ScheduleConfiguration": { "ScheduleExpression": "cron(0/30 * * * ? *)" }, "NotificationConfiguration": { "SnsConfiguration": { "TopicArn": "******" } }, "TargetConfiguration": { "TimestreamConfiguration": { "DatabaseName": "derived", "TableName": "host_count_pt1h", "TimeColumn": "hour", "DimensionMappings": [ { "Name": "microservice_name", "DimensionValueType": "VARCHAR" } ], "MultiMeasureMappings": { "TargetMultiMeasureName": "num_instances", "MultiMeasureAttributeMappings": [ { "SourceColumn": "num_instances", "MeasureValueType": "BIGINT" } ] } } }, "ErrorReportConfiguration": { "S3Configuration" : { "BucketName" : "******", "ObjectKeyPrefix": "errors", "EncryptionOption": "SSE_S3" } }, "ScheduledQueryExecutionRoleArn": "******" }
在新控制面板中使用预先计算的结果
现在,您将了解如何使用已创建计划查询生成的派生表来创建聚合视图控制面板。通过控制面板快照,您还将能够验证从派生表和基础表计算出的聚合结果是否一致。使用派生表创建控制面板后,您会发现其加载速度显著提升,且成本远低于直接从原始数据计算聚合的方式。以下是使用预先计算数据的控制面板快照,以及用于呈现此面板的查询语句,该查询调用存储在 "derived"."host_count_pt1h" 表中的预计算数据。请注意,该查询的结构与控制面板中用于原始数据的查询非常相似,区别在于是否使用已计算出独立计数的派生表,而该查询会对派生表进行聚合操作。
SELECT CASE WHEN microservice_name = 'apollo' THEN num_instances ELSE NULL END AS apollo, CASE WHEN microservice_name = 'athena' THEN num_instances ELSE NULL END AS athena, CASE WHEN microservice_name = 'demeter' THEN num_instances ELSE NULL END AS demeter, CASE WHEN microservice_name = 'hercules' THEN num_instances ELSE NULL END AS hercules, CASE WHEN microservice_name = 'zeus' THEN num_instances ELSE NULL END AS zeus FROM ( SELECT microservice_name, AVG(num_instances) AS num_instances FROM ( SELECT microservice_name, bin(time, 1h), SUM(num_instances) as num_instances FROM "derived"."host_count_pt1h" WHERE time BETWEEN from_milliseconds(1636567785421) AND from_milliseconds(1636654185421) AND measure_name = 'num_instances' GROUP BY microservice_name, bin(time, 1h) ) GROUP BY microservice_name )