AOSPERF03-BP02 Evenly distribute data across data nodes in your OpenSearch Service domain
Maintain efficient resource utilization and optimal query performance by distributing data evenly across data nodes.
Level of risk exposed if this best practice is not established: High
Desired outcome: Data is evenly distributed across data nodes in the OpenSearch Service domain, ensuring efficient resource utilization and optimal query performance.
Benefits of establishing this best practice:
-
Prevent bottlenecks and hotspots: Evenly distributing data across data nodes in your OpenSearch Service domain helps prevent bottlenecks and hotspots, reducing the risk of node overload.
-
Improves availability: When shards are distributed evenly across data nodes, can also help maintain high availability within your domain, as no single node will be overwhelmed with shards or storage, minimizing the risk of full node failure.
Implementation guidance
Node shard skew is when one or more nodes within a domain has
significantly more shards than the other nodes. Node storage skew
is when one or more nodes within a cluster has significantly more
storage (disk.indices) than the other nodes. While both conditions
can occur temporarily, like when a domain has replaced a node and
is still allocating shards to it, you should address them if they
persist because uneven distribution can lead to bottlenecks and
hotspots, causing queries to slow down or even fail. In some
cases, you can have a full node failure when it has a very high
density of shards compared to other nodes in the domain.
You need to identify node shard and index shard skewness, and use shard counts that are multiples of the data node count to distribute each index evenly across data nodes.
Implementation steps
-
Run the
GET _cat/allocationAPI operation in OpenSearch Service to retrieve shard allocation information. -
Compare the
shardsanddisk.indicesentries in the response to identify skew. -
Note the average storage usage across all shards and nodes.
-
Determine if storage skew is normal (within 10% of the average) or significant (over 10% above the average).
-
Use shard counts that are multiples of the data node count to evenly distribute each index across data nodes. If you still see index storage or shard skew, you might need to force a shard reallocation, which occurs with every blue/green deployment of your OpenSearch Service domain.