本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
审计和日志
出于各种不同的原因,收集和分析 [审计] 日志非常有用。日志可以帮助进行根本原因分析和归因,即将更改归因于特定用户。收集到足够的日志后,它们也可以用来检测异常行为。在 EKS 上,审核日志将发送到 Amazon Cloudwatch 日志。EKS 的审计政策如下:
apiVersion: audit.k8s.io/v1beta1 kind: Policy rules: # Log full request and response for changes to aws-auth ConfigMap in kube-system namespace - level: RequestResponse namespaces: ["kube-system"] verbs: ["update", "patch", "delete"] resources: - group: "" # core resources: ["configmaps"] resourceNames: ["aws-auth"] omitStages: - "RequestReceived" # Do not log watch operations performed by kube-proxy on endpoints and services - level: None users: ["system:kube-proxy"] verbs: ["watch"] resources: - group: "" # core resources: ["endpoints", "services", "services/status"] # Do not log get operations performed by kubelet on nodes and their statuses - level: None users: ["kubelet"] # legacy kubelet identity verbs: ["get"] resources: - group: "" # core resources: ["nodes", "nodes/status"] # Do not log get operations performed by the system:nodes group on nodes and their statuses - level: None userGroups: ["system:nodes"] verbs: ["get"] resources: - group: "" # core resources: ["nodes", "nodes/status"] # Do not log get and update operations performed by controller manager, scheduler, and endpoint-controller on endpoints in kube-system namespace - level: None users: - system:kube-controller-manager - system:kube-scheduler - system:serviceaccount:kube-system:endpoint-controller verbs: ["get", "update"] namespaces: ["kube-system"] resources: - group: "" # core resources: ["endpoints"] # Do not log get operations performed by apiserver on namespaces and their statuses/finalizations - level: None users: ["system:apiserver"] verbs: ["get"] resources: - group: "" # core resources: ["namespaces", "namespaces/status", "namespaces/finalize"] # Do not log get and list operations performed by controller manager on metrics.k8s.io resources - level: None users: - system:kube-controller-manager verbs: ["get", "list"] resources: - group: "metrics.k8s.io" # Do not log access to health, version, and swagger non-resource URLs - level: None nonResourceURLs: - /healthz* - /version - /swagger* # Do not log events resources - level: None resources: - group: "" # core resources: ["events"] # Log request for updates/patches to nodes and pods statuses by kubelet and node problem detector - level: Request users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"] verbs: ["update", "patch"] resources: - group: "" # core resources: ["nodes/status", "pods/status"] omitStages: - "RequestReceived" # Log request for updates/patches to nodes and pods statuses by system:nodes group - level: Request userGroups: ["system:nodes"] verbs: ["update", "patch"] resources: - group: "" # core resources: ["nodes/status", "pods/status"] omitStages: - "RequestReceived" # Log delete collection requests by namespace-controller in kube-system namespace - level: Request users: ["system:serviceaccount:kube-system:namespace-controller"] verbs: ["deletecollection"] omitStages: - "RequestReceived" # Log metadata for secrets, configmaps, and tokenreviews to protect sensitive data - level: Metadata resources: - group: "" # core resources: ["secrets", "configmaps"] - group: authentication.k8s.io resources: ["tokenreviews"] omitStages: - "RequestReceived" # Log requests for serviceaccounts/token resources - level: Request resources: - group: "" # core resources: ["serviceaccounts/token"] # Log get, list, and watch requests for various resource groups - level: Request verbs: ["get", "list", "watch"] resources: - group: "" # core - group: "admissionregistration.k8s.io" - group: "apiextensions.k8s.io" - group: "apiregistration.k8s.io" - group: "apps" - group: "authentication.k8s.io" - group: "authorization.k8s.io" - group: "autoscaling" - group: "batch" - group: "certificates.k8s.io" - group: "extensions" - group: "metrics.k8s.io" - group: "networking.k8s.io" - group: "policy" - group: "rbac.authorization.k8s.io" - group: "scheduling.k8s.io" - group: "settings.k8s.io" - group: "storage.k8s.io" omitStages: - "RequestReceived" # Default logging level for known APIs to log request and response - level: RequestResponse resources: - group: "" # core - group: "admissionregistration.k8s.io" - group: "apiextensions.k8s.io" - group: "apiregistration.k8s.io" - group: "apps" - group: "authentication.k8s.io" - group: "authorization.k8s.io" - group: "autoscaling" - group: "batch" - group: "certificates.k8s.io" - group: "extensions" - group: "metrics.k8s.io" - group: "networking.k8s.io" - group: "policy" - group: "rbac.authorization.k8s.io" - group: "scheduling.k8s.io" - group: "settings.k8s.io" - group: "storage.k8s.io" omitStages: - "RequestReceived" # Default logging level for all other requests to log metadata only - level: Metadata omitStages: - "RequestReceived"
建议
启用审核日志
审计日志是由 EKS 管理的 EKS 托管的 Kubernetes 控制平面日志的一部分。控制平面日志(包括 Kubernetes API 服务器、控制器管理器和调度器的日志)以及审计日志的说明可以在此处找到,-plane-logs.html#-export。 enabling/disabling https://docs.aws.amazon.com/eks/ latest/userguide/control enabling-control-plane-log
注意
启用控制平面日志记录后,将产生存储日志的 CloudWatch费用
警告
CloudWatch 日志条目的最大大小为 256KB,而 K ubernetes API 请求的最大大小为 1.5MiB。大于 256KB 的日志条目要么被截断,要么仅包含请求元数据。
利用审计元数据
Kubernetes 审计日志包含两个注释,用于指示请求是否获得授权authorization.k8s.io/decision
以及做出决定的原因。authorization.k8s.io/reason
使用这些属性来确定允许特定 API 调用的原因。
为可疑事件创建警报
创建警报,自动提醒您 403 个禁止响应和 401 个未授权响应增加的地方,然后使用诸如host
sourceIPs
、和之类的属性k8s_user.username
来找出这些请求的来源。
使用 “日志见解” 分析日志
使用 CloudWatch Log Insights 监控 RBAC 对象的更改,例如角色、 RoleBindings ClusterRoles、和。 ClusterRoleBindings下面是一些示例查询:
列出以下内容的更新 aws-auth
ConfigMap:
fields @timestamp, @message | filter @logStream like "kube-apiserver-audit" | filter verb in ["update", "patch"] | filter objectRef.resource = "configmaps" and objectRef.name = "aws-auth" and objectRef.namespace = "kube-system" | sort @timestamp desc
列出新建的验证 webhook 或对验证 webhook 的更改:
fields @timestamp, @message | filter @logStream like "kube-apiserver-audit" | filter verb in ["create", "update", "patch"] and responseStatus.code = 201 | filter objectRef.resource = "validatingwebhookconfigurations" | sort @timestamp desc
列出角色的创建、更新、删除操作:
fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="roles" and verb in ["create", "update", "patch", "delete"]
列出创建、更新、删除操作 RoleBindings:
fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="rolebindings" and verb in ["create", "update", "patch", "delete"]
列出创建、更新、删除操作 ClusterRoles:
fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="clusterroles" and verb in ["create", "update", "patch", "delete"]
列出创建、更新、删除操作 ClusterRoleBindings:
fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="clusterrolebindings" and verb in ["create", "update", "patch", "delete"]
绘制针对密钥的未经授权的读取操作:
fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="secrets" and verb in ["get", "watch", "list"] and responseStatus.code="401" | stats count() by bin(1m)
失败的匿名请求列表:
fields @timestamp, @message, sourceIPs.0 | sort @timestamp desc | limit 100 | filter user.username="system:anonymous" and responseStatus.code in ["401", "403"]
审核您的 CloudTrail 日志
使用服务账户的 IAM 角色 (IRSA) 的 pod APIs 调用的 AWS 会自动与服务账户的名称 CloudTrail 一起登录到。如果未明确授权调用 API 的服务账号的名称出现在日志中,则可能表明 IAM 角色的信任策略配置错误。一般而言,Cloudtrail 是将 AWS API 调用归因于特定 IAM 委托人的好方法。
使用 CloudTrail Insights 发现可疑活动
CloudTrail insights 会自动分析来自CloudTrail 跟踪的写入管理事件,并提醒您注意异常活动。这可以帮助您确定您的 AWS 账户 APIs 中的写入呼叫量何时会增加,包括来自使用 IRSA 担任 IAM 角色的 pod 的呼叫量。有关更多信息,请参阅发布CloudTrail 见解:识别和响应异常 API 活动
其他资源
随着日志量的增加,使用 Log Insights 或其他日志分析工具解析和筛选日志可能会变得无效。作为替代方案,你可能需要考虑运行 Sysdi g Falco 和 ekscloudwat
还有一种选择可能是将审核日志存储在 S3 中,然后使用 R SageMaker andom Cut Forest 算法来处理需要进一步调查的异常行为。
工具和资源
以下商业和开源项目可用于评估您的集群与既定最佳实践的一致性:
-
kube-scan
根据 Kubernetes 通用配置评分系统框架为集群中运行的工作负载分配风险评分 -
Kubescape
Kubescape 是一款开源 kubernetes 安全工具,可以扫描集群、YAML 文件和 Helm 图表。它根据多个框架(包括 NSA-CISA 和 MITRE ATT&CK®)检测错误配置。