稽核和記錄 - Amazon EKS

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

稽核和記錄

收集和分析 【稽核】 日誌對於各種不同的原因很有用。日誌有助於根本原因分析和歸因,即對特定使用者描述變更。當收集到足夠的日誌時,它們也可用於偵測異常行為。在 EKS 上,稽核日誌會傳送至 Amazon Cloudwatch Logs。EKS 的稽核政策如下所示:

apiVersion: audit.k8s.io/v1beta1 kind: Policy rules: # Log full request and response for changes to aws-auth ConfigMap in kube-system namespace - level: RequestResponse namespaces: ["kube-system"] verbs: ["update", "patch", "delete"] resources: - group: "" # core resources: ["configmaps"] resourceNames: ["aws-auth"] omitStages: - "RequestReceived" # Do not log watch operations performed by kube-proxy on endpoints and services - level: None users: ["system:kube-proxy"] verbs: ["watch"] resources: - group: "" # core resources: ["endpoints", "services", "services/status"] # Do not log get operations performed by kubelet on nodes and their statuses - level: None users: ["kubelet"] # legacy kubelet identity verbs: ["get"] resources: - group: "" # core resources: ["nodes", "nodes/status"] # Do not log get operations performed by the system:nodes group on nodes and their statuses - level: None userGroups: ["system:nodes"] verbs: ["get"] resources: - group: "" # core resources: ["nodes", "nodes/status"] # Do not log get and update operations performed by controller manager, scheduler, and endpoint-controller on endpoints in kube-system namespace - level: None users: - system:kube-controller-manager - system:kube-scheduler - system:serviceaccount:kube-system:endpoint-controller verbs: ["get", "update"] namespaces: ["kube-system"] resources: - group: "" # core resources: ["endpoints"] # Do not log get operations performed by apiserver on namespaces and their statuses/finalizations - level: None users: ["system:apiserver"] verbs: ["get"] resources: - group: "" # core resources: ["namespaces", "namespaces/status", "namespaces/finalize"] # Do not log get and list operations performed by controller manager on metrics.k8s.io resources - level: None users: - system:kube-controller-manager verbs: ["get", "list"] resources: - group: "metrics.k8s.io" # Do not log access to health, version, and swagger non-resource URLs - level: None nonResourceURLs: - /healthz* - /version - /swagger* # Do not log events resources - level: None resources: - group: "" # core resources: ["events"] # Log request for updates/patches to nodes and pods statuses by kubelet and node problem detector - level: Request users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"] verbs: ["update", "patch"] resources: - group: "" # core resources: ["nodes/status", "pods/status"] omitStages: - "RequestReceived" # Log request for updates/patches to nodes and pods statuses by system:nodes group - level: Request userGroups: ["system:nodes"] verbs: ["update", "patch"] resources: - group: "" # core resources: ["nodes/status", "pods/status"] omitStages: - "RequestReceived" # Log delete collection requests by namespace-controller in kube-system namespace - level: Request users: ["system:serviceaccount:kube-system:namespace-controller"] verbs: ["deletecollection"] omitStages: - "RequestReceived" # Log metadata for secrets, configmaps, and tokenreviews to protect sensitive data - level: Metadata resources: - group: "" # core resources: ["secrets", "configmaps"] - group: authentication.k8s.io resources: ["tokenreviews"] omitStages: - "RequestReceived" # Log requests for serviceaccounts/token resources - level: Request resources: - group: "" # core resources: ["serviceaccounts/token"] # Log get, list, and watch requests for various resource groups - level: Request verbs: ["get", "list", "watch"] resources: - group: "" # core - group: "admissionregistration.k8s.io" - group: "apiextensions.k8s.io" - group: "apiregistration.k8s.io" - group: "apps" - group: "authentication.k8s.io" - group: "authorization.k8s.io" - group: "autoscaling" - group: "batch" - group: "certificates.k8s.io" - group: "extensions" - group: "metrics.k8s.io" - group: "networking.k8s.io" - group: "policy" - group: "rbac.authorization.k8s.io" - group: "scheduling.k8s.io" - group: "settings.k8s.io" - group: "storage.k8s.io" omitStages: - "RequestReceived" # Default logging level for known APIs to log request and response - level: RequestResponse resources: - group: "" # core - group: "admissionregistration.k8s.io" - group: "apiextensions.k8s.io" - group: "apiregistration.k8s.io" - group: "apps" - group: "authentication.k8s.io" - group: "authorization.k8s.io" - group: "autoscaling" - group: "batch" - group: "certificates.k8s.io" - group: "extensions" - group: "metrics.k8s.io" - group: "networking.k8s.io" - group: "policy" - group: "rbac.authorization.k8s.io" - group: "scheduling.k8s.io" - group: "settings.k8s.io" - group: "storage.k8s.io" omitStages: - "RequestReceived" # Default logging level for all other requests to log metadata only - level: Metadata omitStages: - "RequestReceived"

建議

啟用稽核日誌

稽核日誌是 EKS 管理的 EKS 受管 Kubernetes 控制平面日誌的一部分。如需啟用/停用控制平面日誌的指示,包括 Kubernetes API 伺服器、控制器管理員和排程器的日誌,以及稽核日誌,請參閱 https://https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html#enabling-control-plane-log-export

注意

當您啟用控制平面記錄時,在 CloudWatch 中存放日誌會產生費用。這引發了有關持續安全成本的更廣泛的問題。最後,您必須權衡這些成本與安全違規的成本,例如財務損失、聲譽受損等。您可能會發現,您只能實作本指南中的一些建議,以充分保護您的環境。

警告

CloudWatch Logs 項目的大小上限為 256KB,而 Kubernetes API 請求大小上限為 1.5MiB。大於 256KB 的日誌項目將被截斷,或僅包含請求中繼資料。

使用稽核中繼資料

Kubernetes 稽核日誌包含兩個註釋,指出請求是否獲得授權,authorization.k8s.io/decision以及決策的原因authorization.k8s.io/reason。使用這些屬性來確定允許特定 API 呼叫的原因。

建立可疑事件的警示

建立警示,在 403 禁止和 401 未經授權的回應增加時自動提醒您,然後使用 hostsourceIPs和 等屬性k8s_user.username來了解這些請求的來源。

使用 Log Insights 分析日誌

使用 CloudWatch Log Insights 監控 RBAC 物件的變更,例如 Roles、RoleBindings、ClusterRoles 和 ClusterRoleBindings。以下顯示一些範例查詢:

列出 ConfigMap aws-auth 的更新:

fields @timestamp, @message | filter @logStream like "kube-apiserver-audit" | filter verb in ["update", "patch"] | filter objectRef.resource = "configmaps" and objectRef.name = "aws-auth" and objectRef.namespace = "kube-system" | sort @timestamp desc

列出驗證 Webhook 的新 或變更的建立:

fields @timestamp, @message | filter @logStream like "kube-apiserver-audit" | filter verb in ["create", "update", "patch"] and responseStatus.code = 201 | filter objectRef.resource = "validatingwebhookconfigurations" | sort @timestamp desc

列出角色的建立、更新、刪除操作:

fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="roles" and verb in ["create", "update", "patch", "delete"]

列出 RoleBindings 的建立、更新、刪除操作:

fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="rolebindings" and verb in ["create", "update", "patch", "delete"]

列出 ClusterRoles 的建立、更新、刪除操作:

fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="clusterroles" and verb in ["create", "update", "patch", "delete"]

列出 ClusterRoleBindings 的建立、更新、刪除操作:

fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="clusterrolebindings" and verb in ["create", "update", "patch", "delete"]

針對秘密繪製未經授權的讀取操作:

fields @timestamp, @message | sort @timestamp desc | limit 100 | filter objectRef.resource="secrets" and verb in ["get", "watch", "list"] and responseStatus.code="401" | stats count() by bin(1m)

失敗的匿名請求清單:

fields @timestamp, @message, sourceIPs.0 | sort @timestamp desc | limit 100 | filter user.username="system:anonymous" and responseStatus.code in ["401", "403"]

稽核 CloudTrail 日誌

使用服務帳戶 IAM 角色 (IRSA) 的 Pod 呼叫的 AWS APIs 會自動記錄到 CloudTrail 和服務帳戶的名稱。如果未明確授權呼叫 API 的服務帳戶名稱出現在日誌中,可能表示 IAM 角色的信任政策設定錯誤。一般而言,Cloudtrail 是對特定 IAM 主體進行 AWS API 呼叫的絕佳方式。

使用 CloudTrail Insights 來找出可疑活動

CloudTrail 洞察會自動分析來自 CloudTrail 追蹤的寫入管理事件,並提醒您異常活動。這可協助您識別 AWS 帳戶中寫入 APIs的呼叫量何時增加,包括來自使用 IRSA 擔任 IAM 角色的 Pod。如需詳細資訊,請參閱宣布 CloudTrail Insights:識別和回應異常 API 活動

其他資源

隨著日誌數量的增加,使用 Log Insights 或其他日誌分析工具剖析和篩選日誌可能會變得無效。或者,建議您考慮執行 Sysdig Falcoekscloudwatch。Falco 會分析稽核日誌,並標記長時間的異常或濫用。ekscloudwatch 專案會將稽核日誌事件從 CloudWatch 轉送至 Falco 進行分析。Falco 提供一組預設稽核規則,以及新增您自己的功能。

然而,另一個選項可能是將稽核日誌存放在 S3 中,並使用 SageMaker 隨機剪切森林演算法來異常行為,需要進一步調查。

工具和資源

下列商業和開放原始碼專案可用來評估叢集與已建立最佳實務的一致性: