diff --git a/docs/reference/how-to/size-your-shards.asciidoc b/docs/reference/how-to/size-your-shards.asciidoc index 5f67014d5bb4a..3b3891b43500e 100644 --- a/docs/reference/how-to/size-your-shards.asciidoc +++ b/docs/reference/how-to/size-your-shards.asciidoc @@ -1,17 +1,40 @@ [[size-your-shards]] == Size your shards +[discrete] +[[what-is-a-shard]] +=== What is a shard? + +A shard is a basic unit of storage in {es}. Every index is divided into one or more shards to help distribute data and workload across nodes in a cluster. This division allows {es} to handle large datasets and perform operations like searches and indexing efficiently. For more detailed information on shards, see <>. + +[discrete] +[[sizing-shard-guidelines]] +=== General guidelines + +Balancing the number and size of your shards is important for the performance and stability of an {es} cluster: + +* Too many shards can degrade search performance and make the cluster unstable. This is referred to as _oversharding_. +* Very large shards can slow down search operations and prolong recovery times after failures. + +To avoid either of these states, implement the following guidelines: + +[discrete] +[[general-sizing-guidelines]] +==== General sizing guidelines + +* Aim for shard sizes between 10GB and 50GB +* Keep the number of documents on each shard below 200 million + +[discrete] +[[shard-distribution-guidelines]] +==== Shard distribution guidelines -Each index in {es} is divided into one or more shards, each of which may be -replicated across multiple nodes to protect against hardware failures. If you -are using <> then each data stream is backed by a sequence of -indices. There is a limit to the amount of data you can store on a single node -so you can increase the capacity of your cluster by adding nodes and increasing -the number of indices and shards to match. However, each index and shard has -some overhead and if you divide your data across too many shards then the -overhead can become overwhelming. A cluster with too many indices or shards is -said to suffer from _oversharding_. An oversharded cluster will be less -efficient at responding to searches and in extreme cases it may even become -unstable. +To ensure that each node is working optimally, distribute shards evenly across nodes. Uneven distribution can cause some nodes to work harder than others, leading to performance degradation and instability. + +While {es} automatically balances shards, you need to configure indices with an appropriate number of shards and replicas to allow for even distribution across nodes. + +If you are using <>, each data stream is backed by a sequence of indices, each index potentially having multiple shards. + +In addition to these these general guidelines, you should develop a tailored <> that considers your specific infrastructure, use case, and performance expectations. [discrete] [[create-a-sharding-strategy]] @@ -208,6 +231,7 @@ index can be <>. You may then consider setting <> against the destination index for the source index's name to point to it for continuity. +See this https://www.youtube.com/watch?v=sHyNYnwbYro[fixing shard sizes video] for an example troubleshooting walkthrough. [discrete] [[shard-count-recommendation]] @@ -571,6 +595,8 @@ PUT _cluster/settings } ---- +See this https://www.youtube.com/watch?v=tZKbDegt4-M[fixing "max shards open" video] for an example troubleshooting walkthrough. For more information, see <>. + [discrete] [[troubleshooting-max-docs-limit]] ==== Number of documents in the shard cannot exceed [2147483519]