# Tiered storage for Standard brokers Tiered storage is a low-cost storage tier for Amazon MSK that scales to virtually unlimited storage, making it cost-effective to build streaming data applications. You can create an Amazon MSK cluster configured with tiered storage that balances performance and cost. Amazon MSK stores streaming data in a performance-optimized primary storage tier until it reaches the Apache Kafka topic retention limits. Then, Amazon MSK automatically moves data into the new low-cost storage tier. When your application starts reading data from the tiered storage, you can expect an increase in read latency for the first few bytes. As you start reading the remaining data sequentially from the low-cost tier, you can expect latencies that are similar to the primary storage tier. You don't need to provision any storage for the low-cost tiered storage or manage the infrastructure. You can store any amount of data and pay only for what you use. This feature is compatible with the APIs introduced in [KIP-405: Kafka Tiered Storage](https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage). For information about sizing, monitoring, and optimizing your MSK tiered storage cluster, see [Best practices for running production workloads using Amazon MSK tiered storage](https://aws.amazon.com/blogs/big-data/best-practices-for-running-production-workloads-using-amazon-msk-tiered-storage/). Here are some of the features of tiered storage: + You can scale to virtually unlimited storage. You don't have to guess how to scale your Apache Kafka infrastructure. + You can retain data longer in your Apache Kafka topics, or increase your topic storage, without the need to increase the number of brokers. + It provides a longer duration safety buffer to handle unexpected delays in processing. + You can reprocess old data in its exact production order with your existing stream processing code and Kafka APIs. + Partitions rebalance faster because data on secondary storage doesn't require replication across broker disks. + Data between brokers and the tiered storage moves within the VPC and doesn't travel through the internet. + A client machine can use the same process to connect to new clusters with tiered storage enabled as it does to connect to a cluster without tiered storage enabled. See [Create a client machine](https://docs.aws.amazon.com/msk/latest/developerguide/create-client-machine.html). ## Tiered storage requirements for Amazon MSK clusters + You must use Apache Kafka client version 3.0.0 or higher to create a new topic with tiered storage enabled. To transition an existing topic to tiered storage, you can reconfigure a client machine that uses a Kafka client version lower than 3.0.0 (minimum supported Apache Kafka version is 2.8.2.tiered) to enable tiered storage. See [Step 4: Create a topic in the Amazon MSK cluster](create-topic.md). + The Amazon MSK cluster with tiered storage enabled must use version 3.6.0 or higher, or 2.8.2.tiered. ## Tiered storage constraints and limitations for Amazon MSK clusters Tiered storage has the following constraints and limitations: + Make sure clients are not configured to `read_committed` when reading from the remote\$1tier in Amazon MSK, unless the application is actively using the transactions feature. + Tiered storage isn't available in AWS GovCloud (US) regions. + Tiered storage applies only to provisioned mode clusters. + Tiered storage doesn’t support broker size t3.small. + The minimum retention period in low-cost storage is 3 days. There is no minimum retention period for primary storage. + Tiered storage doesn’t support Multiple Log directories on a broker (JBOD related features). + Tiered storage doesn't support compacted topics. Make sure that all topics that have tiered storage turned on have their cleanup.policy configured to 'DELETE' only. + Tiered storage cluster doesn’t support altering the log.cleanup.policy policy for a topic after it’s created. + Tiered storage can be disabled for individual topics but not for the entire cluster. Once disabled, tiered storage cannot be re-enabled for a topic. + If you use Amazon MSK version 2.8.2.tiered, you can migrate only to another tiered storage-supported Apache Kafka version. If you don't want to continue using a tiered storage-supported version, create a new MSK cluster and migrate your data to it. + The kafka-log-dirs tool can't report tiered storage data size. The tool only reports the size of the log segments in primary storage. For information about default settings and constraints you must be mindful of when you configure tiered storage at the topic level, see [Guidelines for Amazon MSK tiered storage topic-level configuration](msk-guidelines-tiered-storage-topic-level-config.md). # How log segments are copied to tiered storage for a Amazon MSK topic When you enable tiered storage for a new or existing topic, Apache Kafka copies closed log segments from primary storage to tiered storage. + Apache Kafka only copies closed log segments. It copies all messages within the log segment to tiered storage. + Active segments are not eligible for tiering. The log segment size (segment.bytes) or the segment roll time (segment.ms) controls the rate of segment closure, and the rate Apache Kafka then copies them to tiered storage. Retention settings for a topic with tiered storage enabled are different from settings for a topic without tiered storage enabled. The following rules control the retention of messages in topics with tiered storage enabled: + You define retention in Apache Kafka with two settings: log.retention.ms (time) and log.retention.bytes (size). These settings determine the total duration and size of the data that Apache Kafka retains in the cluster. Whether or not you enable tiered storage mode, you set these configurations at the cluster level. You can override the settings at the topic level with topic configurations. + When you enable tiered storage, you can additionally specify how long the primary high-performance storage tier stores data. For example, if a topic has overall retention (log.retention.ms) setting of 7 days and local retention (local.retention.ms) of 12 hours, then the cluster primary storage retains data for only the first 12 hours. The low-cost storage tier retains the data for the full 7 days. + The usual retention settings apply to the full log. This includes its tiered and primary parts. + The local.retention.ms or local.retention.bytes settings control the retention of messages in primary storage. Apache Kafka copies closed log segments to tiered storage as soon as they close (based on segment.bytes or segment.ms), independent of local retention settings. After segments are copied to tiered storage, they remain in primary storage until the local.retention.ms or local.retention.bytes thresholds are reached. At that point, the data is deleted from primary storage but remains available in tiered storage. This allows you to keep recent data on high-performance primary storage for fast access while older data is served from the low-cost tiered storage. + When Apache Kafka copies a message in a log segment to tiered storage, it removes the message from the cluster based on retention.ms or retention.bytes settings. ## Example Amazon MSK tiered storage scenario This scenario illustrates how an existing topic that has messages in primary storage behaves when tiered storage is enabled. You enable tiered storage on this topic by when you set remote.storage.enable to `true`. In this example, retention.ms is set to 5 days and local.retention.ms is set to 2 days. The following is the sequence of events when a segment expires. **Time T0 - Before you enable tiered storage.** Before you enable tiered storage for this topic, there are two log segments. One of the segments is active for an existing topic partition 0. ![\[Time T0 - Before you enable tiered storage.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-1.png) **Time T1 (< 2 days) - Tiered storage enabled. Segment 0 copied to tiered storage.** After you enable tiered storage for this topic, Apache Kafka copies closed log segment 0 to tiered storage as soon as it closes. The segment closes based on segment.bytes or segment.ms settings, not based on retention settings. Apache Kafka retains a copy in primary storage as well. The active segment 1 is not eligible to copy to tiered storage yet because it is still active and hasn't closed. In this timeline, Amazon MSK doesn't apply any of the retention settings yet for any of the messages in segment 0 and segment 1. (local.retention.bytes/ms, retention.ms/bytes) ![\[Time T1 (< 2 days) - Tiered storage enabled. Segment 0 copied to tiered storage.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-2.png) **Time T2 - Local retention in effect.** After 2 days, the local retention threshold is reached for segment 0. The setting of local.retention.ms as 2 days determines this. Segment 0 is now deleted from primary storage, but it remains available in tiered storage. Note that segment 0 was already copied to tiered storage at Time T1 when it closed, not at Time T2 when local retention expired. Active segment 1 is neither eligible for deletion nor eligible to copy to tiered storage yet because it is still active. ![\[Time T2 - Local retention in effect.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-3.png) **Time T3 - Overall retention in effect.** After 5 days, retention settings take effect, and Kafka clears log segment 0 and associated messages from tiered storage. Segment 1 is neither eligible for expiration nor eligible to copy over to tiered storage yet because it is active. Segment 1 is not yet closed, so it is ineligible for segment roll. ![\[Time T3 - Overall retention in effect.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-4.png) # Create a Amazon MSK cluster with tiered storage with the AWS Management Console This process describes how to create a tiered storage Amazon MSK cluster using the AWS Management Console. 1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/). 1. Choose **Create cluster**. 1. Choose **Custom create** for tiered storage. 1. Specify a name for the cluster. 1. In the **Cluster type**, select **Provisioned**. 1. Choose an Amazon Kafka version that supports tiered storage for Amazon MSK to use to create the cluster. 1. Specify a size of broker other than **kafka.t3.small**. 1. Select the number of brokers that you want Amazon MSK to create in each Availability Zone. The minimum is one broker per Availability Zone, and the maximum is 30 brokers per cluster. 1. Specify the number of zones that brokers are distributed across. 1. Specify the number of Apache Kafka brokers that are deployed per zone. 1. Select **Storage options**. This includes **Tiered storage and EBS storage** to enable tiered storage mode. 1. Follow the remaining steps in the cluster creation wizard. When complete, **Tiered storage and EBS storage** appears as the cluster storage mode in the **Review and create** view. 1. Select **Create cluster**. # Create an Amazon MSK cluster with tiered storage with the AWS CLI To enable tiered storage on a cluster, create the cluster with the correct Apache Kafka version and attribute for tiered storage. Follow the code example below. Also, complete the steps in the next section to [Create a Kafka topic with tiered storage enabled with the AWS CLI](#msk-create-topic-tiered-storage-cli). See [create-cluster](https://docs.aws.amazon.com//cli/latest/reference/kafka/create-cluster.html) for a complete list of supported attributes for cluster creation. ``` aws kafka create-cluster \ —cluster-name "MessagingCluster" \ —broker-node-group-info file://brokernodegroupinfo.json \ —number-of-broker-nodes 3 \ --kafka-version "3.6.0" \ --storage-mode "TIERED" ``` ## Create a Kafka topic with tiered storage enabled with the AWS CLI To complete the process that you started when you created a cluster with the tiered storage enabled, also create a topic with tiered storage enabled with the attributes in the later code example. The attributes specifically for tiered storage are the following: + `local.retention.ms` (for example, 10 mins) for time-based retention settings or `local.retention.bytes` for log segment size limits. + `remote.storage.enable` set to `true` to enable tiered storage. The following configuration uses local.retention.ms, but you can replace this attribute with local.retention.bytes. This attribute controls the amount of time that can pass or number of bytes that Apache Kafka can copy before Apache Kafka copies the data from primary to tiered storage. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes. **Note** You must use the Apache Kafka client version 3.0.0 and above. These versions support a setting called `remote.storage.enable` only in those client versions of `kafka-topics.sh`. To enable tiered storage on an existing topic that uses an earlier version of Apache Kafka, see the section [Enabling tiered storage on an existing Amazon MSK topic](msk-enable-disable-topic-tiered-storage-cli.md#msk-enable-topic-tiered-storage-cli). ``` bin/kafka-topics.sh --create --bootstrap-server $bs --replication-factor 2 --partitions 6 --topic MSKTutorialTopic --config remote.storage.enable=true --config local.retention.ms=100000 --config retention.ms=604800000 --config segment.bytes=134217728 ``` # Enable and disable tiered storage on an existing Amazon MSK topic These sections cover how to enable and disable tiered storage on a topic that you've already created. To create a new cluster and topic with tiered storage enabled, see [Creating a cluster with tiered storage using the AWS Management Console](https://docs.aws.amazon.com//msk/latest/developerguide/msk-create-cluster-tiered-storage-console). ## Enabling tiered storage on an existing Amazon MSK topic To enable tiered storage on an existing topic, use the `alter` command syntax in the following example. When you enable tiered storage on an already existing topic, you aren't restricted to a certain Apache Kafka client version. ``` bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name msk-ts-topic --add-config 'remote.storage.enable=true, local.retention.ms=604800000, retention.ms=15550000000' ``` ## Disable tiered storage on an existing Amazon MSK topic To disable tiered storage on an existing topic, use the `alter` command syntax in the same order as when you enable tiered storage. ``` bin/kafka-configs.sh --bootstrap-server $bs --alter --entity-type topics --entity-name MSKTutorialTopic --add-config 'remote.log.msk.disable.policy=Delete, remote.storage.enable=false' ``` **Note** When you disable tiered storage, you completely delete the topic data in tiered storage. Apache Kafka retains primary storage data , but it still applies the primary retention rules based on `local.retention.ms`. After you disable tiered storage on a topic, you can't re-enable it. If you want to disable tiered storage on an existing topic, you aren't restricted to a certain Apache Kafka client version. # Enable tiered storage on an existing Amazon MSK cluster using AWS CLI **Note** You can enable tiered storage only if your cluster's log.cleanup.policy is set to `delete`, as compacted topics are not supported on tiered storage. Later, you can configure an individual topic's log.cleanup.policy to `compact` if tiered storage is not enabled on that particular topic. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes. 1. **Update the Kafka version** – Cluster versions aren't simple integers. To find the current version of the cluster, use the `DescribeCluster` operation or the `describe-cluster` AWS CLI command. An example version is `KTVPDKIKX0DER`. ``` aws kafka update-cluster-kafka-version --cluster-arn ClusterArn --current-version Current-Cluster-Version --target-kafka-version 3.6.0 ``` 1. Edit cluster storage mode. The following code example shows editing the cluster storage mode to `TIERED` using the [https://docs.aws.amazon.com/cli/latest/reference/kafka/update-storage.html](https://docs.aws.amazon.com/cli/latest/reference/kafka/update-storage.html) API. ``` aws kafka update-storage --current-version Current-Cluster-Version --cluster-arn Cluster-arn --storage-mode TIERED ``` # Update tiered storage on an existing Amazon MSK cluster using the console This process describes how to updated a tiered storage Amazon MSK cluster using the AWS Management Console. Make sure the current Apache Kafka version of your MSK cluster is 2.8.2.tiered. Refer to [updating the Apache Kafka version](https://docs.aws.amazon.com/msk/latest/developerguide/version-upgrades.html) if you need to upgrade your MSK cluster to 2.8.2.tiered version. **Note** You can enable tiered storage only if your cluster's log.cleanup.policy is set to `delete`, as compacted topics are not supported on tiered storage. Later, you can configure an individual topic's log.cleanup.policy to `compact` if tiered storage is not enabled on that particular topic. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes. 1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/). 1. Go to the cluster summary page and choose **Properties**. 1. Go to the **Storage** section and choose **Edit cluster storage mode**. 1. Choose **Tiered storage and EBS storage** and **Save changes**.