Infrastructure

  • Infrastructure Monitoring
  • Network Performance Monitoring
  • Network Device Monitoring
  • Container Monitoring
  • Cloud Cost Management
  • Log Management
  • Sensitive Data Scanner
  • Audit Trail
  • Observability Pipelines

Applications

  • Application Performance Monitoring
  • Universal Service Monitoring
  • Continuous Profiler
  • Database Monitoring
  • Data Streams Monitoring
  • Service Catalog
  • Application Vulnerability Management
  • Application Security Management
  • Cloud Security Management

Digital Experience

  • Browser Real User Monitoring
  • Mobile Real User Monitoring
  • Synthetic Monitoring
  • Session Replay
  • Error Tracking

Software Delivery

  • CI Visibility
  • Continuous Testing

Platform Capabilities

  • Workflow Automation
  • Incident Management
  • Integrations
  • Financial Services
  • Manufacturing & Logistics
  • Healthcare/Life Sciences
  • Retail/E-Commerce
  • Media & Entertainment
  • Amazon Web Services
  • Google Cloud Platform
  • Kubernetes Monitoring
  • Red Hat OpenShift
  • Pivotal Platform
  • Cloud Migration
  • Monitoring Consolidation
  • Shift-Left Testing
  • Digital Experience Monitoring
  • Security Analytics
  • Compliance for CIS Benchmarks
  • Hybrid Cloud Monitoring
  • IoT Monitoring
  • Machine Learning
  • Real-Time BI
  • On-Premises Monitoring
  • Log Analysis & Correlation
  • Latest News
  • Analyst Reports
  • Investor Relations
  • The Monitor
  • Engineering
  • Pup Culture
  • Security Labs
  • Get Started Free

How to resolve unassigned shards in Elasticsearch

Last updated: January 8, 2019

Further Reading

eBook: Monitoring Modern Infrastructure

Explore key steps for implementing a successful cloud-scale monitoring strategy.

Download to learn more

Editor’s note: Elasticsearch uses the term “master” to describe its architecture and certain metric names. Datadog does not use this term. Within this blog post, we will refer to this term as “primary”.

In Elasticsearch, a healthy cluster is a balanced cluster: primary and replica shards are distributed across all nodes for durable reliability in case of node failure.

But what should you do when you see shards lingering in an UNASSIGNED state?

Before we dive into some solutions, let’s verify that the unassigned shards contain data that we need to preserve (if not, deleting these shards is the most straightforward way to resolve the issue). If you already know the data’s worth saving, jump to the solutions:

  • Shard allocation is purposefully delayed
  • Too many shards, not enough nodes
  • You need to re-enable shard allocation
  • Shard data no longer exists in the cluster
  • Low disk watermark
  • Multiple Elasticsearch versions

The commands in this post are formatted under the assumption that you are running each Elasticsearch instance’s HTTP service on the default port (9200) . They are also directed to localhost , which assumes that you are submitting the request locally; otherwise, replace localhost with your node’s IP address.

Cost-effectively collect, process, search, and analyze logs at scale with Logging without Limits™.

Pinpointing problematic shards

Elasticsearch’s cat shards API will tell you which shards are unassigned, and why:

Each row lists the name of the index, the shard number, whether it is a primary (p) or replica (r) shard, and the reason it is unassigned:

If you’re running version 5+ of Elasticsearch, you can also use the cluster allocation explain API to try to garner more information about shard allocation issues:

The resulting output will provide helpful details about why certain shards in your cluster remain unassigned:

In this case, the API clearly explains why the replica shard remains unassigned: “the shard cannot be allocated to the same node on which a copy of the shard already exists”. To view more details about this particular issue and how to resolve it, skip ahead to a later section of this post.

If it looks like the unassigned shards belong to an index you thought you deleted already, or an outdated index that you don’t need anymore, then you can delete the index to restore your cluster status to green:

If that didn’t solve the issue, read on to try other solutions.

Reason 1: Shard allocation is purposefully delayed

When a node leaves the cluster, the primary node temporarily delays shard reallocation to avoid needlessly wasting resources on rebalancing shards, in the event the original node is able to recover within a certain period of time ( one minute, by default ). If this is the case, your logs should look something like this:

You can dynamically modify the delay period like so:

Replacing <INDEX_NAME> with _all will update the threshold for all indices in your cluster.

After the delay period is over, you should start seeing the primary assigning those shards. If not, keep reading to explore solutions to other potential causes.

Reason 2: Too many shards, not enough nodes

As nodes join and leave the cluster, the primary node reassigns shards automatically, ensuring that multiple copies of a shard aren’t assigned to the same node . In other words, the primary node will not assign a primary shard to the same node as its replica, nor will it assign two replicas of the same shard to the same node. A shard may linger in an unassigned state if there are not enough nodes to distribute the shards accordingly.

To avoid this issue, make sure that every index in your cluster is initialized with fewer replicas per primary shard than the number of nodes in your cluster by following the formula below:

Where N is the number of nodes in your cluster, and R is the largest shard replication factor across all indices in your cluster.

In the screenshot below, the many-shards index is stored on three primary shards and each primary has four replicas. Six of the index’s 15 shards are unassigned because our cluster only contains three nodes. Two replicas of each primary shard haven’t been assigned because each of the three nodes already contains a copy of that shard.

To resolve this issue, you can either add more data nodes to the cluster or reduce the number of replicas. In our example, we either need to add at least two more nodes in the cluster or reduce the replication factor to two, like so:

After reducing the number of replicas, take a peek at Cerebro to see if all shards have been assigned.

Reason 3: You need to re-enable shard allocation

In the Cerebro screenshot below, an index has just been added to the cluster, but its shards haven’t been assigned.

Shard allocation is enabled by default on all nodes , but you may have disabled shard allocation at some point (for example, in order to perform a rolling restart ), and forgotten to re-enable it.

To enable shard allocation, update the Cluster Update Settings API :

If this solved the problem, your Cerebro or Datadog dashboard should show the number of unassigned shards decreasing as they are successfully assigned to nodes.

It looks like this solved the issue for all of our unassigned shards, with one exception: shard 0 of the constant-updates index. Let’s explore other possible reasons why the shard remains unassigned.

Reason 4: Shard data no longer exists in the cluster

In this case, primary shard 0 of the constant-updates index is unassigned. It may have been created on a node without any replicas (a technique used to speed up the initial indexing process), and the node left the cluster before the data could be replicated. The primary detects the shard in its global cluster state file, but can’t locate the shard’s data in the cluster.

Another possibility is that a node may have encountered an issue while rebooting. Normally, when a node resumes its connection to the cluster, it relays information about its on-disk shards to the primary node, which then transitions those shards from “unassigned” to “assigned/started”. When this process fails for some reason (e.g. the node’s storage has been damaged in some way), the shards may remain unassigned.

In this scenario, you have to decide how to proceed: try to get the original node to recover and rejoin the cluster (and do not force allocate the primary shard), or force allocate the shard using the Cluster Reroute API and reindex the missing data using the original data source, or from a backup.

If you decide to go with the latter (forcing allocation of a primary shard), the caveat is that you will be assigning an “empty” shard . If the node that contained the original primary shard data were to rejoin the cluster later, its data would be overwritten by the newly created (empty) primary shard, because it would be considered a “newer” version of the data. Before proceeding with this action, you may want to retry allocation instead, which would allow you to preserve the data stored on that shard.

If you understand the implications and still want to force allocate the unassigned primary shard, you can do so by using the allocate_empty_primary flag. The following command reroutes primary shard 0 in the constant-updates index to a specific node:

Note that you’ll need to specify "accept_data_loss" : "true" to confirm that you are prepared to lose the data on the shard. If you don’t include this parameter, you will see an error like the one below:

You will now need to reindex the missing data, or restore as much as you can from a backup snapshot using the Snapshot and Restore API .

Reason 5: Low disk watermark

The primary node may not be able to assign shards if there are not enough nodes with sufficient disk space (it will not assign shards to nodes that have over 85 percent disk in use ). Once a node has reached this level of disk usage, or what Elasticsearch calls a “low disk watermark”, it will not be assigned more shards.

You can check the disk space on each node in your cluster (and see which shards are stored on each of those nodes) by querying the cat API :

Consult this article for options on what to do if any particular node is running low on disk space (remove outdated data and store it off-cluster, add more nodes, upgrade your hardware, etc.).

If your nodes have large disk capacities, the default low watermark (85 percent disk usage) may be too low. You can use the Cluster Update Settings API to change cluster.routing.allocation.disk.watermark.low and/or cluster.routing.allocation.disk.watermark.high . For example, this Stack Overflow thread points out that if your nodes have 5TB disk capacity, you can probably safely increase the low disk watermark to 90 percent:

If you want your configuration changes to persist upon cluster restart, replace “transient” with “persistent”, or update these values in your configuration file. You can choose to update these settings by either using byte or percentage values, but be sure to remember this important note from the Elasticsearch documentation : “Percentage values refer to used disk space, while byte values refer to free disk space.”

Reason 6: Multiple Elasticsearch versions

This problem only arises in clusters running more than one version of Elasticsearch (perhaps in the middle of a rolling upgrade ). According to the Elasticsearch documentation , the primary node will not assign a primary shard’s replicas to any node running an older version. For example, if a primary shard is running on version 1.4, the primary node will not be able to assign that shard’s replicas to any node that is running any version prior to 1.4.

If you try to manually reroute a shard from a newer-version node to an older-version node, you will see an error like the one below:

Elasticsearch does not support rollbacks to previous versions, only upgrades. Upgrading the nodes running the older version should solve the problem if this is indeed the issue at hand.

Have you tried turning it off and on again?

If none of the scenarios above apply to your situation, you still have the option of reindexing the missing data from the original data source, or restoring the affected index from an old snapshot, as explained here .

Monitoring for unassigned shards

It’s important to fix unassigned shards as soon as possible, as they indicate that data is missing/unavailable, or that your cluster is not configured for optimal reliability. If you’re already using Datadog, turn on the Elasticsearch integration and you’ll immediately begin monitoring for unassigned shards and other key Elasticsearch performance and health metrics. If you don’t use Datadog but would like to, sign up for a free trial .

Related jobs at Datadog

Related Posts

Monitor your AWS ECS platform with Convox and Datadog

Nginx 502 bad gateway: gunicorn, nginx 502 bad gateway: php-fpm, 3 clear trends in ecs adoption, start monitoring your metrics in minutes, request a personalized demo with a datadog engineer.

Get Started with Datadog

  • Stack Overflow Public questions & answers
  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Talent Build your employer brand
  • Advertising Reach developers & technologists worldwide
  • Labs The future of collective knowledge sharing
  • About the company

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

ElasticSearch: Unassigned Shards, how to fix?

I have an ES cluster with 4 nodes:

I had to restart search03, and when it came back, it rejoined the cluster no problem, but left 7 unassigned shards laying about.

Now my cluster is in yellow state. What is the best way to resolve this issue?

  • Delete (cancel) the shards?
  • Move the shards to another node?
  • Allocate the shards to the node?
  • Update 'number_of_replicas' to 2?
  • Something else entirely?

Interestingly, when a new index was added, that node started working on it and played nice with the rest of the cluster, it just left the unassigned shards laying about.

Follow on question: am I doing something wrong to cause this to happen in the first place? I don't have much confidence in a cluster that behaves this way when a node is restarted.

NOTE: If you're running a single node cluster for some reason, you might simply need to do the following:

elasticsearch reassign shards

28 Answers 28

By default, Elasticsearch will re-assign shards to nodes dynamically. However, if you've disabled shard allocation (perhaps you did a rolling restart and forgot to re-enable it), you can re-enable shard allocation.

Elasticsearch will then reassign shards as normal. This can be slow, consider raising indices.recovery.max_bytes_per_sec and cluster.routing.allocation.node_concurrent_recoveries to speed it up.

If you're still seeing issues, something else is probably wrong, so look in your Elasticsearch logs for errors. If you see EsRejectedExecutionException your thread pools may be too small .

Finally, you can explicitly reassign a shard to a node with the reroute API .

Wilfred Hughes's user avatar

  • 3 When I did that I got: { "error" : "ElasticsearchIllegalArgumentException[[allocate] failed to find [logstash-2015.01.05][1] on the list of unassigned shards]", "status" : 400 } Even though I can see that shard is one of the unallocated ones in ES-Head –  wjimenez5271 Jan 12, 2015 at 18:19
  • Incidentally, other shards did work that were listed as unallocated, and then the remaining ones fixed themselves. –  wjimenez5271 Jan 12, 2015 at 18:28
  • @willbradley I do use these commands as written, and settings is without an underscore. This works on v1.4, what version are you using? –  Wilfred Hughes Jan 6, 2016 at 12:02
  • 1 Since release 5.0, the "allocate" command has changed to provide more options - the example above would now be "allocate_empty_primary", omitting the "allow_primary" parameter. –  jmb May 8, 2017 at 14:58
  • 7 you need to add -H 'Content-Type: application/json' if you get the error Content-Type header [application/x-www-form-urlencoded] is not supported –  luckydonald Dec 7, 2017 at 14:34

OK, I've solved this with some help from ES support. Issue the following command to the API on all nodes (or the nodes you believe to be the cause of the problem):

where <index> is the index you believe to be the culprit. If you have no idea, just run this on all nodes:

I also added this line to my yaml config and since then, any restarts of the server/service have been problem free. The shards re-allocated back immediately.

FWIW, to answer an oft sought after question, set MAX_HEAP_SIZE to 30G unless your machine has less than 60G RAM, in which case set it to half the available memory.

  • Shard Allocation Awareness

slm's user avatar

  • 2 to solve this in version 1.1.1, should I use cluster.routing.allocation.enable = none? –  user3175226 May 14, 2014 at 13:41
  • 1 Allocation disable is no longer documented there, at least not as of Nov 20. –  user153275 Nov 20, 2014 at 17:27
  • 3 Note that routing allocation is a cluster-wide setting, so it doesn't matter which node you send the command to. –  Wilfred Hughes Jan 8, 2015 at 13:43
  • I added both in my es yml file. index.routing.allocation.disable_allocation : false cluster.routing.allocation.enable: none But still the unassigned shards are showing.. What can be the reason ? –  bagui Jan 10, 2015 at 19:42
  • 6 In version 6.8 I get an error: { "type": "illegal_argument_exception", "reason": "unknown setting [index.routing.allocation.disable_allocation] please check that any required plugins are installed, or check the breaking changes documentation for removed settings" } ], –  Janac Meena Jan 13, 2020 at 14:17

This little bash script will brute force reassign, you may lose data.

W. Andrew Loe III's user avatar

  • I got this error: <br> {"error":"JsonParseException[Unexpected characte r (',' (code 44)): expected a valid value (number, String, array, object, 'true' , 'false' or 'null')\n at [Source: [B@3b1fadfb; line: 6, column: 27]]","status": 500} <br> what should i do to fix it –  biolinh Mar 30, 2015 at 14:45
  • Thanks a ton! It saved precious time!! –  Sathish May 17, 2018 at 8:42
  • 2 The script throws the error: {"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406} –  Janac Meena Jan 13, 2020 at 14:18
  • Thanks ! Worked for me (ElasticSearch 1.4.x). –  David Goodwin Oct 17, 2020 at 11:13

I also encountered similar error. It happened to me because one of my data node was full and due to which shards allocation failed. If unassigned shards are there and your cluster is RED and few indices also RED, in that case I have followed below steps and these worked like a champ. in kibana dev tool-

If any unassigned shards are there then you will get details else will throw ERROR.

simply running below command will solve everything-

Thanks to - https://github.com/elastic/elasticsearch/issues/23199#issuecomment-280272888

Yogesh Jilhawar's user avatar

  • thanks very helpful, saved me lots of hours. –  Dudi Boy Nov 16, 2021 at 16:54
  • Really you saved my lots of hours. Thanks. Better than elasticsearch docs. –  Hassan Ketabi Aug 24, 2022 at 14:38

The only thing that worked for me was changing the number_of_replicas (I had 2 replicas, so I changed it to 1 and then changed back to 2).

(I Already asnwered it in this question )

Community's user avatar

  • This seems like would create heavy load on the network and on the processing on data intensive clusters. Did you try this on a big data system? Could you share the rough numbers? –  Ricardo Sep 10, 2020 at 23:55

Elasticsearch automatically allocates shards if the below config is set to all. This config can be set using a rest api as well cluster.routing.allocation.enable: all

If even after application of the below config, es fails to assign the shards automatically, then you have to force assign the shards yourself. ES official link for this

I have written a script to force assign all unassigned shards across cluster.

below array contains list of nodes among which you want to balance the unassigned shards

Nischal Kumar's user avatar

  • This script did not work, that is, after I ran it, i still had UNASSIGNED shards. –  Chris F Feb 21, 2017 at 18:08
  • @ChrisF In line1: you need to replace node1, node2, node3 with the actual node names. You can get them with a curl localhost:9200/_cat/nodes. –  siddharthlatest Mar 12, 2017 at 11:20

I've stuck today with the same issue of shards allocation. The script that W. Andrew Loe III has proposed in his answer didn't work for me, so I modified it a little and it finally worked:

Now, I'm not kind of a Bash guru, but the script really worked for my case. Note, that you'll need to specify appropriate values for "ES_HOST" and "NODE" variables.

Splanger's user avatar

  • unfortunately the ES5x broke compatibility: elastic.co/guide/en/elasticsearch/reference/5.1/… –  Fawix Jan 12, 2017 at 20:10
  • 2 In order for the script above to work with ES5x replace allocate with allocate_empty_primary and replace \"allow_primary\": true with \"accept_data_loss\": true –  Fawix Jan 12, 2017 at 20:19
  • Getting {"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406} even after applying Fawix's suggestion –  Janac Meena Jan 13, 2020 at 14:28

In my case, when I create a new index then the default number_of_replicas is set as 1. And the number of nodes in my cluster was only one so there was no extra node to create the replica, so the health was turning to yellow. So when I created the index with settings property and set the number_of_replicas as 0. Then it worked fine. Hope this helps.

Apoorv Nag's user avatar

In my case, the hard disk space upper bound was reached.

Look at this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

Basically, I ran:

So that it will allocate if <90% hard disk space used, and move a shard to another machine in the cluster if >95% hard disk space used; and it checks every 1 minute.

manyways's user avatar

Maybe it helps someone, but I had the same issue and it was due to a lack of storage space caused by a log getting way too big.

Hope it helps someone! :)

Juanjo Lainez Reche's user avatar

I was having this issue as well, and I found an easy way to resolve it.

Get the index of unassigned shards

Install curator Tools, and use it to delete index

NOTE: In my case, the index is logstash of the day 2016-04-21

  • Then check the shards again, all the unassigned shards go away!

user3391471's user avatar

  • 1 @sim, very thanks for your edit for my answer. I am very poor at edit, will pay more attention to it. –  user3391471 May 27, 2016 at 0:30
  • For me, it was: curator_cli --host 127.0.0.1 delete_indices --filter_list '[{"filtertype":"pattern","kind":"prefix","value":"logstash-"}]' –  Gaui Feb 20, 2018 at 1:46

I had the same problem but the root cause was a difference in version numbers (1.4.2 on two nodes (with problems) and 1.4.4 on two nodes (ok)). The first and second answers (setting "index.routing.allocation.disable_allocation" to false and setting "cluster.routing.allocation.enable" to "all") did not work.

However, the answer by @Wilfred Hughes (setting "cluster.routing.allocation.enable" to "all" using transient) gave me an error with the following statement:

[NO(target node version [1.4.2] is older than source node version [1.4.4])]

After updating the old nodes to 1.4.4 these nodes started to resnc with the other good nodes.

Jörg Rech's user avatar

I also meet this situation and finally fixed it.

Firstly, I will describe my situation. I have two nodes in ElasticSearch cluster, they can find each other, but when I created a index with settings "number_of_replicas" : 2 , "number_of_shards" : 5, ES show yellow signal and unassigned_shards is 5.

The problem occurs because the value of number_of_replicas , when I set its value with 1 , all is fine.

Armstrongya's user avatar

  • 4 The number of replicas should always be N-1 the number of nodes you have. So in your scenario with 2 nodes, 1 of the nodes contains the primary shard, while he other node has the replica, hence your number of replicas should be set to 1. N = 2, N - 1 = 1. –  slm May 9, 2016 at 15:05

For me, this was resolved by running this from the dev console: "POST /_cluster/reroute?retry_failed"

I started by looking at the index list to see which indices were red and then ran

"get /_cat/shards?h=[INDEXNAME],shard,prirep,state,unassigned.reason"

and saw that it had shards stuck in ALLOCATION_FAILED state, so running the retry above caused them to re-try the allocation.

ScottFoster1000's user avatar

  • As of version 5.6.3 the comand should be get /_cat/shards/[INDEXNAME]?h=,shard,prirep,state,unassigned.reason –  fasantos Oct 20, 2018 at 21:29

In my case an old node with old shares was joining the cluster, so we had to shutdown the old node and delete the indices with unassigned shards.

alwe's user avatar

I tried several of the suggestions above and unfortunately none of them worked. We have a "Log" index in our lower environment where apps write their errors. It is a single node cluster. What solved it for me was checking the YML configuration file for the node and seeing that it still had the default setting "gateway.expected_nodes: 2". This was overriding any other settings we had. Whenever we would create an index on this node it would try to spread 3 out of 5 shards to the phantom 2nd node. These would therefore appear as unassigned and they could never be moved to the 1st and only node.

The solution was editing the config, changing the setting "gateway.expected_nodes" to 1, so it would quit looking for its never-to-be-found brother in the cluster, and restarting the Elastic service instance. Also, I had to delete the index, and create a new one. After creating the index, the shards all showed up on the 1st and only node, and none were unassigned.

Daniel Knowlton's user avatar

Similar problem on ES 7.4.2, commands have changed. As already mentionned in answers , first thing to check GET _cluster/allocation/explain?pretty then POST _cluster/reroute?retry_failed

Primary You have to pass "accept_data_loss": true for a primary shard

cluster-reroute doc

Pierre-Damien's user avatar

Might help, but I had this issue when trying to run ES in embedded mode. Fix was to make sure the Node had local(true) set.

JARC's user avatar

Another possible reason for unassigned shards is that your cluster is running more than one version of the Elasticsearch binary.

shard replication from the more recent version to the previous versions will not work

This can be a root cause for unassigned shards.

Elastic Documentation - Rolling Upgrade Process

Marc Tamsky's user avatar

I ran into exactly the same issue. This can be prevented by temporarily setting the shard allocation to false before restarting elasticsearch, but this does not fix the unassigned shards if they are already there.

In my case it was caused by lack of free disk space on the data node. The unassigned shards where still on the data node after the restart but they where not recognized by the master.

Just cleaning 1 of the nodes from the disk got the replication process started for me. This is a rather slow process because all the data has to be copied from 1 data node to the other.

Brian van Rooijen's user avatar

I tried to delete unassigned shards or manually assign them to particular data node. It didn't work because unassigned shards kept appearing and health status was "red" over and over. Then I noticed that one of the data nodes stuck in "restart" state. I reduce number of data nodes, killed it. Problem is not reproducible anymore.

thepolina's user avatar

I had two indices with unassigned shards that didn't seem to be self-healing. I eventually resolved this by temporarily adding an extra data-node [1] . After the indices became healthy and everything stabilized to green, I removed the extra node and the system was able to rebalance (again) and settle on a healthy state.

It's a good idea to avoid killing multiple data nodes at once (which is how I got into this state). Likely, I had failed to preserve any copies/replicas for at least one of the shards. Luckily, Kubernetes kept the disk storage around, and reused it when I relaunched the data-node.

...Some time has passed...

Well, this time just adding a node didn't seem to be working (after waiting several minutes for something to happen), so I started poking around in the REST API.

This showed my new node with "decision": "YES" .

By the way, all of the pre-existing nodes had "decision": "NO" due to "the node is above the low watermark cluster setting" . So this was probably a different case than the one I had addressed previously.

Then I made the following simple POST [2] with no body , which kicked things into gear ...

Other notes:

Very helpful: https://datadoghq.com/blog/elasticsearch-unassigned-shards

Something else that may work. Set cluster_concurrent_rebalance to 0 , then to null -- as I demonstrate here .

[1] Pretty easy to do in Kubernetes if you have enough headroom: just scale out the stateful set via the dashboard.

[2] Using the Kibana "Dev Tools" interface, I didn't have to bother with SSH/exec shells.

Brent Bradburn's user avatar

I just first increased the

"index.number_of_replicas"

by 1 (wait until nodes are synced) then decreased it by 1 afterwards, which effectively removes the unassigned shards and cluster is Green again without the risk of losing any data.

I believe there are better ways but this is easier for me.

Hope this helps.

Yusuf Demirag's user avatar

When dealing with corrupted shards you can set the replication factor to 0 and then set it back to the original value. This should clear up most if not all your corrupted shards and relocate the new replicas in the cluster.

Setting indexes with unassigned replicas to use a replication factor of 0:

Setting them back to 1:

Note: Do not run this if you have different replication factors for different indexes. This would hardcode the replication factor for all indexes to 1.

bonzofenix's user avatar

This may be a cause of the disk space as well, In Elasticsearch 7.5.2, by default, if disk usage is above 85% then replica shards are not assigned to any other node.

This can be fixed by setting a different threshold or by disabling it either in the .yml or via Kibana

M.abdelrhman's user avatar

First use cluster health API to get the current health of cluster, where RED means one or more primary shards missing and Yellow means one of more replica shards are missing.

After this use the cluster allocation explain API to know why a particular shard is missing and elasticsearch is not able to allocate it on data-node.

Once you get the exact root cause, try to address the issue, which often requires, changing few cluster settings(mentioned in @wilfred answer earlier ) But in some cases, if its replica shards, and you have another copy of same shard(i.e. another replica) available, you can reduce the replica count using update replica setting and later on again increase it, if you need it.

Apart from above, if your cluster allocation API mention it doesn't have a valid data nodes to allocate a shard, than you need to add a new data nodes, or change the shard allocation awareness settings .

BaCaRoZzo's user avatar

If you have an unassigned shard, usually the first step is to call the allocation explain API and look for the reason. Depending on the reason, you'd do something about it. Here are some that come to mind:

  • node doesn't have enough disk space (check disk-based allocation settings)
  • node can't allocate the shard because of some restrictions like allocation is disabled or allocation filtering or awareness (e.g. node is on the wrong side of the cluster, like the other availability zone or a hot or a warm node)
  • there's some error loading the shard. E.g. a checksum fails on files, there's a missing synonyms file referenced by an analyzer

Sometimes it helps to bump-start it, like using the Cluster Reroute API to allocate the shard manually, or disabling and re-enabling replicas.

If you need more info on operating Elasticsearch, check Sematext's Elasticsearch Operations training (disclaimer: I'm delivering it).

Radu Gheorghe's user avatar

If you are using the aws elasticsearch service, the above suggestions will not provide a solution. In this case, I backed up the index with the backup structure connected to s3. Then I deleted the index and restored it. it worked for me. Please make sure the backup completed successfully!

Bilal Demir's user avatar

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct .

Not the answer you're looking for? Browse other questions tagged elasticsearch sharding or ask your own question .

  • Featured on Meta
  • Our Design Vision for Stack Overflow and the Stack Exchange network
  • Call for volunteer reviewers for an updated search experience: OverflowAI Search
  • Discussions experiment launching on NLP Collective
  • Temporary policy: Generative AI (e.g., ChatGPT) is banned

Hot Network Questions

  • Is there any way I can recycle a device that has been iCloud locked?
  • Was the U.S. surprised about the Huawei breakthrough, how much time did the government or think-tanks expect China to take to produce a 7nm chip?
  • What does ggf reserviert mean on DB trains?
  • Required ancilla qubits for a given function's oracle
  • Restrict sorting to a certain range
  • Add two really big numbers
  • Do neural network weights need to add up to one?
  • DIY fiber optic network connection: 600ft through forest
  • Why are hangars so high?
  • What Coordinate format does the UNESCO web site provide?
  • Can we prevent a command from being executed for root in Linux?
  • Why do some US senators, like Joe Biden, Mitch McConnell, Dianne Feinstein, etc., last for so long in the Senate?
  • Why 2 thermostats of different rating in series in heater circuit?
  • What is “Lacrimosa” in music?
  • A burnt-out 15A receptacle is wired into a 60A circuit. What can I do?
  • How to draw a bar-in-bar chart using Mathematica? or align two bar/rectangle charts at the center of the bar rather than the left alignment?
  • How much could a real world quetzalcoatlus carry?
  • Making single digit numbers between 1 and 9 into double digits, inside CSV file
  • Why does Canada's superficial loss rule also include a clause for "30 days before the sale"?
  • How did peer discovery work in Bitcoin v0.1?
  • How to do maximum likelihood estimation when numerical derivatives cannot be calculated
  • Difference between how a physicist and mathematician approach science?
  • Are there any good alternatives to firearms for 1920s aircrafts?
  • Order by Earliest Lower Digit

elasticsearch reassign shards

Your privacy

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy .

Diagnose unassigned shards edit

There are multiple reasons why shards might get unassigned, ranging from misconfigured allocation settings to lack of disk space.

In order to diagnose the unassigned shards in your deployment use the following steps:

In order to diagnose the unassigned shards, follow the next steps:

  • Log in to the Elastic Cloud console .

On the Elasticsearch Service panel, click the name of your deployment.

If the name of your deployment is disabled your Kibana instances might be unhealthy, in which case please contact Elastic Support . If your deployment doesn’t include Kibana, all you need to do is enable it first .

Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to Dev Tools > Console .

Kibana Console

View the unassigned shards using the cat shards API .

The response will look like this:

Unassigned shards have a state of UNASSIGNED . The prirep value is p for primary shards and r for replicas.

The index in the example has a primary shard unassigned.

To understand why an unassigned shard is not being assigned and what action you must take to allow Elasticsearch to assign it, use the cluster allocation explanation API .

The explanation in our case indicates the index allocation configurations are not correct. To review your allocation settings, use the get index settings and cluster get settings APIs.

  • Change the settings using the update index settings and cluster update settings APIs to the correct values in order to allow the index to be allocated.

For more guidance on fixing the most common causes for unassinged shards please follow this guide or contact Elastic Support .

In order to diagnose the unassigned shards follow the next steps:

For more guidance on fixing the most common causes for unassinged shards please follow this guide .

Most Popular

Get Started with Elasticsearch

Intro to Kibana

ELK for Logs & Metrics

Live OpenSearch Online Training starting on October 12! See all classes

Solr / Elasticsearch Experts – Search & Big Data Analytics

How to Find and Fix Elasticsearch Unassigned Shards

When a data index is created in Elasticsearch , the data is divided into shards for horizontal scaling across multiple nodes. These shards are small pieces of data that make up the index and play a significant role in the performance and stability of Elasticsearch deployments.

A shard can be classified as either a primary shard or a replica shard. A replica is a copy of the primary shard, and whenever Elasticsearch indexes data, it is first indexed to one of the primary shards. The data is then replicated in all the replica shards of that shard so that both the primary and replica shards contain the same data.

Elasticsearch distributes the search load among the primary and replica shards, which enhances search performance. If the primary shard is lost, the replica can take its place as the new primary shard, so the replica provides fault tolerance as well.

The number of primary shards is fixed at index creation time and is defined in the settings. The number of replicas can be changed at any time without blocking search operations. However, this doesn’t apply to the number of primary shards, which should be defined before creating the index.

It is generally recommended to have at least one replica shard for each primary shard to ensure the availability and reliability of the cluster.

elasticsearch reassign shards

Primary shards and replica shards explanation. ( Source )

If shards are not appropriately assigned or fail to be assigned, they end up in the unassigned state, which means the data allocated in the shard is no longer available for indexing and searching operations.

Why Shards Are Unassigned in Elasticsearch: Common Causes

Shards in Elasticsearch can go through different states during their lifetime:

  • Initializing – This is the initial state before a shard is ready to be used.
  • Started – This state represents when a shard is active and can receive requests.
  • Relocating – This state occurs when a shard is in the process of being moved to a different node.
  • Unassigned – This state occurs when a shard has failed to be assigned. The reason for this may vary, as explained below.

Ideally, you’d monitor these states over time, so you can see if there are periods of instability. Sematext Cloud can monitor that for you and alert on anomalies:

elasticsearch reassign shards

These are the most common cases for unassigned shards in Elasticsearch:

  • More shards than available nodes
  • Shard allocation being disabled
  • Shard allocation failure
  • A node leaving the cluster
  • Network issues

How to Find Unassigned Shards in Elasticsearch

To find if the Elasticsearch cluster has unassigned shards, it is possible to leverage the Elasticsearch API .

Using the _cluster/health API, enter the following command:

The output will show useful information about the status of the cluster and the state of the shards:

It is then possible to call the _cluster/allocation/explain endpoint to learn the reason for the current status which is yellow that means that primary shards are allocated but replica shards are not and hence, there are unassigned shards. For instance:

Your One Stop Shop for Elasticsearch

How to fix unassigned shards.

The proper resolution of unassigned shard issues depends on the cause of the problem.

More Shards Than Available Nodes

Whenever the Elasticsearch cluster has more replicas than available nodes to allocate them to, or shards configuration restrictions, shards may become unassigned. The solution is either adding more nodes or reducing the number of replicas. For the latter, the Elasticsearch API can be used. For instance, to reduce the number of replicas in the .kibana index to zero you can use the following call:

To add more nodes to a cluster, follow these steps:

  • Install Elasticsearch on the new nodes and make sure to use the same version of Elasticsearch that is running on the already deployed nodes.
  • Configure the new nodes to join the existing Elasticsearch cluster. To do so, it is necessary to configure the name of the cluster, and the seed hosts option should be configured with the IP address or DNS of at least a running Elasticsearch node in the elasticsearch.yml configuration file.
  • Start the Elasticsearch process on the new nodes.

More information can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html

After adding the new node, the Elasticsearch master node will balance the shards automatically. However, it is possible to manually move the shards across the nodes of the cluster with the Elasticsearch API. Here there is an example of the _cluster/reroute endpoint and the move operation; in this case, Elasticsearch will move the shard “0” with the index “my_index” from “node1” to “node2”:

It is possible to update the settings of the indices to decrease the number of replicas of the already indexed data. This will remove unassigned replicas and recover the cluster’s normal state, but there will not be a backup of the primary shards in case of failure and hence, it is possible to face data loss.

Here is an example of the endpoint call used to reduce the number of replicas to 0.

Shard Allocation Disabled

It is possible that shard allocation may be disabled. When this occurs, it is necessary to re-enable allocation in the cluster by using the _cluster/settings API endpoint. Here is an example of the endpoint call:

After enabling the allocation, Elasticsearch will automatically allocate and move the shards as needed to balance the load and ensure that all the shards are allocated to a node. However, if this still fails, the reroute endpoint can be used again to finish the operation:

Shard Allocation Failure

Shard allocation failure is one of the most frequent issues when it comes to unassigned shards. Whenever a shard allocation failure occurs, Elasticsearch will automatically retry the allocation five times before giving up. This setting can be changed so that more retries are attempted, but the issue may persist until the root cause is determined.

The failure may happen for a few reasons.

Shard allocation failed during allocation and rebalancing operations due to a lack of disk space

Elasticsearch indices are stored in the Elasticsearch nodes, which can fill up the server’s disk space if no measures are taken to avoid it. To prevent this from happening, Elasticsearch has established disk watermarks that represent thresholds to ensure that all nodes have enough disk space.

Whenever the low watermark is crossed, Elasticsearch will stop allocating shards to that node. Elasticsearch will aim to move shards from a node whose disk usage is above the high watermark level. Additionally, if this watermark is exceeded, Elasticsearch can reach the “flood-stage” watermark, when it will block writing operations to all the indices that have one shard (primary or replica) in the node affected.

While it is possible to increase the Elasticsearch thresholds, there is a risk that the disk will completely fill up, causing the server to be unable to function properly.

To avoid running out of disk, you can delete old indices, add new nodes to the cluster, or increase disk space on the node.

The _cluster/settings API endpoint can be used to change the thresholds. Set the cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high settings in the request body.

Here is an example of the command to set the low watermark to 80% and the high watermark to 90% using the Elasticsearch API:

Shard allocation failed because is not possible due to allocation rules

It is possible to define rules to control which shards can go to which nodes. For instance, Shard allocation awareness or Shard allocation filtering.

The forced awareness setting controls replica shard allocation for every location. This option prevents a location from being overloaded until nodes are available in a different location. More information about this setting can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness

The shard allocation filter setting controls where Elasticsearch allocates shards from any index. More information about this setting can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#cluster-shard-allocation-filtering

A Node Leaving The Cluster

Nodes in the cluster can fail or leave the cluster for different reasons. If a node is absent from the cluster, the data might be missing if no replicas are in place and/or the node cannot be recovered.

The allocation of replica shards that become unassigned because a node has left can be delayed with the index.unassigned.node_left.delayed_timeout API endpoint, which defaults to 1 minute. Here is an example:

However, if the node is not going to return, it is possible to configure Elasticsearch to immediately assign the shards to the other nodes by updating the timeout to zero:

Additionally, there are certain cases where it is not possible to recover the shards, such as if the node hosting the shard fails and cannot be recovered. In those cases, it will be necessary to either delete and reindex the affected index or restore the data from a backup. However, deleting the index will delete the data stored in that index.

The following endpoint can delete a specific index with failures:

Network Issues

If the nodes don’t have connectivity in the Elasticsearch cluster port range, the shards might not be properly allocated. If there is no connectivity between the node that hosts the primary shard and the node that hosts a new replica, recovery is not possible until connectivity is recovered. In this case, it is necessary to check firewall/SELinux constraints.

The following command can be used to test connectivity:

The ports used by Elasticsearch are explained here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#common-network-settings

It is also possible that the shards are too big for the network bandwidth to handle. In this case, it is possible to add more primary shards per index for new indices. To accomplish the same for already created indices, it is possible to create a new index and reindex using the Reindex API . For instance:

If you are using an alias, you can point the alias to the new index:

How to Maintain an Optimal Elasticsearch Cluster

You might avoid issues caused by unassigned shards in Elasticsearch by monitoring and detecting potential problems early on and maintaining cluster health.

Monitoring and Detection

Monitoring the Elasticsearch cluster is very important to avoid issues in normal operation and avoid data loss. For this, you might rely on observability tools like Sematext Cloud , which has Elasticsearch monitoring capabilities. Here’s the default Overview dashboard, though you have many others and you can create your own:

elasticsearch reassign shards

There are also various API endpoints to retrieve the status of the cluster, its nodes, usage, and shard allocation:

  • Cluster health API returns the health status of a cluster.
  • Nodes API returns information about the cluster’s nodes.
  • Shards API provides a detailed view of which nodes contain which shards.
  • Indices API returns information about the indices in a cluster.

In addition to this, Elasticsearch logs can be used to get information about the cluster activity and any errors or issues in the cluster. Some of the areas to monitor include:

  • Errors or exceptions. These can indicate an issue with the cluster or operations being performed on it.
  • Resource usage. High resource usage can indicate that the cluster is under a high load.
  • Performance. If the cluster is performing poorly, there could be a problem that needs to be addressed.

Note that Sematext Cloud can, out of the box, parse and centralize your Elasticsearch logs so you can see the above info in default (but customizable) dashboards:

elasticsearch reassign shards

Cluster Health Maintenance

To properly maintain the status of the Elasticsearch cluster, indices should be managed to avoid further issues. For example, if old indices are not removed, the server’s disk space could be affected, and the number of shards may grow to a point where the hardware of the server cannot keep up with the operations.

If the indices are time-based data, they can be managed using their timestamp as a reference. Other conditions may be applied to manage indices.

To manage the indices, there is a feature called index lifecycle management (ILM). It allows the definition of rules for managing the indices in the cluster, including actions such as closing, opening, or deleting indices based on specific criteria. For instance, it is possible to close indices that have not been accessed for a certain period of time or delete indices that have reached a certain age or size. This can help free up resources and improve the overall performance of the cluster.

The figure below shows an example of a policy consisting of four phases. In the Hot phase, there is a rollover action if the index surpasses 50 GB within one minute; otherwise, it will enter the Warm phase, but there is no action for this phase. After two minutes, the index will enter the Cold phase and, finally, will get deleted after three minutes.

elasticsearch reassign shards

Elasticsearch ILM example policy ( source ).

Still need some help? Sematext provides a full range of services for Elasticsearch and its entire ecosystem.

Unassigned shards can be a significant issue for Elasticsearch users because they can impact the performance and availability of the index. In this guide, tips and recommendations on how to resolve the issue were given.

You can use the Elasticsearch API to find out the root cause of the issue by leveraging the Cluster Allocation Explain and Cluster Health endpoints. It is then possible to take corrective measures to solve the issue, such as adding more nodes, removing data, and changing the cluster settings and index settings.

Additionally, you can leverage Elasticsearch monitoring tools like Sematext Cloud to make sure your cluster is set up to avoid similar issues in the future.

By following these recommendations, you can ensure that your Elasticsearch cluster remains healthy and able to handle the demands of your data. If you want to learn more about Elasticsearch you can check Sematext training . If you need specialized help with production environments, check out the production support subscriptions.

Start Free Trial

You might also like

  • For OpenSearch
  • To OpenSearch
  • Running with Managed Service
  • FREE TOOLS WITH ❤️ FOR THE COMMUNITY OpsGPT Get expert answers to your search questions.
  • Opster Check-Up Detect search problems and resolve them.
  • Opster Template Optimizer Analyze your templates and improve performance.
  • OpenSearch K8 Operator Deploy, manage and orchestrate OpenSearch on Kubernetes.
  • Opster Cost Insight Optimize your search resource utilization and reduce your costs.
  • Opster Search Log Analyzer Detect search problems and resolve them.
  • Guides Technical guides on Elasticsearch & Opensearch
  • Blog Insights, Opster news, blogs and more
  • Documentation Opster product documentation
  • Error Messages
  • Error Repository
  • About Our story & vision
  • Careers We’re Hiring 🎉 Always on the lookout for talented team members. Join us!
  • AutoOps Login
  • Community Tools Login
  • Try AutoOps
  • Community Tools

Elasticsearch Guides > Basics

Elasticsearch Shards

By Opster Team

Updated: Jul 2, 2023

Before you dig into the details of this technical guide, have you tried asking OpsGPT ?

You'll receive concise answers that will help streamline your Elasticsearch/OpenSearch operations.

Try OpsGPT now for step-by-step guidance and tailored insights into your Elasticsearch/ OpenSearch operation.

To analyze your shard size, check related issues and improve your configuration, we recommend you run the Elasticsearch Health Check-Up . Aside from shards, the Check-Up will also review and analyze your threadpools, memory, snapshots, disk watermarks and more, to help you improve performance.

The Elasticsearch Check-Up is free and requires no installation.

elasticsearch reassign shards

What are shards? + A common issue - 2 min

Data in an Elasticsearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519 documents.

The number of shards is set when an index is created, and this number cannot be changed later without reindexing the data. When creating an index, you can set the number of shards and replicas as properties of the index using:

The ideal number of shards should be determined based on the amount of data in an index. Generally, an optimal shard should hold 30-50GB of data. For example, if you expect to accumulate around 300GB of application logs in a day, having around 10 shards in that index would be reasonable.

During their lifetime, shards can go through a number of states, including:

  • Initializing: An initial state before the shard can be used.
  • Started: A state in which the shard is active and can receive requests.
  • Relocating: A state that occurs when shards are in the process of being moved to a different node. This may be necessary under certain conditions, such as when the node they are on is running out of disk space.
  • Unassigned: The state of a shard that has failed to be assigned. A reason is provided when this happens. For example, if the node hosting the shard is no longer in the cluster (NODE_LEFT) or due to restoring into a closed index (EXISTING_INDEX_RESTORED).

In order to view all shards, their states, and other metadata , use the following request:

To view shards for a specific index, append the name of the index to the URL, for example:

This command produces output, such as in the following example. By default, the columns shown include the name of the index, the name (i.e. number) of the shard, whether it is a primary shard or a replica, its state, the number of documents, the size on disk, the IP address, and the node ID.

Notes and good things to know

  • Having shards that are too large is simply inefficient. Moving huge indices across machines is both a time- and labor-intensive process. First, the Lucene merges would take longer to complete and would require greater resources. Moreover, moving the shards across the nodes for rebalancing would also take longer and recovery time would be extended. Thus by splitting the data and spreading it across a number of machines, it can be kept in manageable chunks and minimize risks.
  • Having the right number of shards is important for performance. It is thus wise to plan in advance. When queries are run across different shards in parallel, they execute faster than an index composed of a single shard, but only if each shard is located on a different  node  and there are sufficient nodes in the cluster. At the same time, however, shards consume memory and disk space, both in terms of indexed data and cluster  metadata . Having too many shards can slow down queries, indexing requests, and management operations, and so maintaining the right balance is critical.

How to reduce your Elasticsearch costs by optimizing your shards

Watch the video below to learn how to save money on your deployment by optimizing your shards.

elasticsearch reassign shards

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Related log errors to this ES concept

Page: 1 of 3 >

elasticsearch reassign shards

Get expert answers on Elasticsearch/OpenSearch

Related Articles

  • How to Increase Primary Shard Count in Elasticsearch
  • How to Handle Recurring RED Status
  • Coupa Reduced Elasticsearch Costs by 60%

Opster

  • Privacy Overview
  • Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Controlling Shard Rebalance in Elasticsearch and OpenSearch

Jeovanny Alvarez

Shard rebalancing in Elasticsearch is the redistribution of index shards across nodes within a cluster. it sometimes requires supervision to achieve an ideal state. In this post, we will discuss what shard rebalancing is and how it can be configured.

What is Shard Rebalancing?

An imbalanced Elasticsearch cluster is a cluster with shards of different sizes, or different usage patterns (or both) that are grouped on one or few nodes creating hotspots that often cause performance issues.

In Elasticsearch and OpenSearch, shard rebalancing is the process of redistributing shards across the cluster in order to rebalance the cluster. Automated shard rebalancing is enabled by default and can be triggered by events such as a node reaching the high disk watermark threshold (e.g. the node is approaching its storage limits), a node joining or leaving the cluster, and few more cluster and node events.

Elasticsearch is constantly working to maintain a balance amongst the nodes within a cluster. In an ideal world, every node would have the exact same number of shards, every shard would store the same amount of data and have an equally distributed load. During the process of shard rebalancing, Elasticsearch attempts to achieve and maintain this equilibrium amongst nodes.

In practice, however, the rebalancing effort is rarely able to achieve such an equilibrium. Indices can be configured poorly and have too few shards to evenly distribute to each node in the cluster, or just a number of shards that is not a multiple of the number of data nodes. Additionally, it adheres to the settings configured within the cluster for allocation filtering and forced awareness which can prevent it from achieving a true balance across nodes. Lastly, the shard rebalancing algorithm is not sophisticated enough as it only tries to balance the number of shards, not the size of the shards or the work they are being involved in (indexing and search).

To achieve best performance it is not enough to balance the number of shards per node. An Elasticsearch cluster administrator needs to work towards detecting and dissolving cluster hotspots, while keeping in mind shards balance and available disk space.

Shard Allocation in Elasticsearch and OpenSearch

In order to understand how shard rebalancing works, one must understand shard allocations in Elasticsearch. Shard allocation is, simply put, the method in which Elasticsearch decides on which node each shard should reside, and shards rebalance is just an attempt to optimize shard allocation on nodes.

Elasticsearch has over a dozen different allocation deciders that define how shards should be allocated across nodes. Shard rebalancing is subject to the rules defined in these deciders (or more specifically a subset of these deciders like the ClusterRebalanceAllocationDecider , DiskThresholdDecider , AwarenessAllocationDecider , etc). As such, it is subject to both the cluster-level and index-level shard allocation and routing settings configured within the cluster.

When the high disk watermark threshold is met or exceeded, for example, this triggers a rebalancing event to occur. Elasticsearch will attempt to move shards off of this node in order to free up disk space and allow the node to get below the disk threshold. In this instance, the DiskThresholdDecider will decide whether the node Elasticsearch wants to move a shard to is viable. If allocation a shard from the overloaded node to a new node will end up having that node exceed the high disk watermark threshold as well, then the decider will ‘decide’ not to move the shard to that node. Elasticsearch will then search for another node to move the shard to instead.

For most use cases, the settings for rebalancing and shard allocation should be left at their default values. Improper configuration of these settings can lead to cluster-wide failures (e.g. disabling rebalancing can cause a single node to store a disproportionate amount of data and reach its maximum storage limitation while other nodes are nowhere near their limits).

Shard Rack Awareness

In addition to the shard allocation and routing settings, you can also configure shard allocation awareness at the individual index and cluster levels.

Shard allocation awareness refers to a set of configurations that allow you to fine-tune the shard allocations in Elasticsearch and OpenSearch. It is a feature typically reserved for self-hosted clusters intended to allow you to allocate shards across ‘zones’. This is useful for disaster recovery scenarios in the case ‘zone1’ has a blackout, ‘zone2’ can still function as needed. For example, if index1 has 1 shard and 1 replica and you use the shard allocation awareness settings to allocate the primary shard to zone1 and the replica to zone2, this index will still be searchable in the event of a failure to either zone (so long as both zones don’t fail simultaneously).

This feature is not useful and often not available on managed versions of Elasticsearch and OpenSearch.

When to Disable Shard Rebalancing

Though rebalancing should normally be enabled to allow the cluster to “breathe” and manage itself, there are instances where you may want to temporarily or permanently disable rebalancing events in order to prevent these resource heavy operations from occurring. If you are in the process of performing server maintenance (e.g. installing security patches) on a node or set of nodes, for example, you may want to temporarily disable rebalancing during the maintenance period. This way Elasticsearch will not unnecessarily reallocate shards from the node(s) that are down for maintenance to others.

One instance when you want to disable shard rebalancing, most likely permanently, is when your index lifecycle policy (ILM) is built in such a way that indices are deleted and created daily on an equal amount. Since it’s generally balanced, the cluster can maintain itself well during most of the time, and you can trigger rebalance manually periodically to optimize shard balancing. Disabling automatic and continuous shard rebalance in this case will usually optimize performance during normal operation as shard rebalancing is resource intensive operation, and in such cases it isn’t really necessary to be performed on a regular basis.

If you choose to disable rebalancing, you can do so by modifying the cluster settings like so:

To re-enable the settings, set the values to “all” or null (setting values to null re-enables the default values for those settings).

Forcing a Manual Shard Rebalance

There are ways to force a shard rebalancing event to occur, but keep in mind that this is a resource intensive operation, so take care when forcing a rebalance. One such method is to use the _cluster/reroute API. This API allows you to manually move a shard to the desired node which then triggers a rebalance event in order for Elasticsearch to maintain the cluster equilibrium. The API works as follows:

In this example the ‘move’ command is instructing Elasticsearch to move shard 0 from the index named test from node1 to node2. Additionally, the ‘allocate_replica’ command is instructing Elasticsearch to allocate unassigned replica shard 1 from index test to node3.

This API also allows you to perform a “dry run” by adding the ‘dry_run=true’ parameter to the request. This is recommended before performing a manual allocation as you can see what the end result will be before the actual movement occurs. The Elasticsearch response will contain a calculation of the result of the allocation and after rebalancing without actually performing the actions.

Another feature of this API is it allows you to retry failed allocations. This can be used in cases where an allocation failure was due to some temporary or transient issue with the cluster. Once the issue is resolved, you can simple retry the allocations using the ‘retry_failed’ parameter like so: POST /_cluster/reroute?retry_failed=true

Having a balanced cluster is important and necessary for your cluster to keep performing at its optimal level. By balancing shards across nodes, Elasticsearch / OpenSearch is able to distribute load and disk utilization across nodes. Manual configuration of rebalancing and allocation settings can be done, but it is not recommended for most use cases. Shard rebalancing is a resource-intensive process, so care should be taken when forcing a rebalancing event to occur.

The algorithm Elasticsearch uses to perform shard rebalancing is not perfect. It fails to take into account key metrics such as the size of the shards, and instead only looks at the number of shards. For some situations, it’s even recommended to completely disable shard rebalance and manually monitor the shard distribution and fix things when needed.

If you are experiencing issues with shard allocations and/or rebalancing in your cluster and would like some assistance, check out our Pulse solution. It can offer insights into your cluster with actionable recommendations on items such as shard allocations. It also allows you to tap into world-class Elasticsearch experts to help with your needs. If you’re interested in learning more, please reach out to us here .

Comments are now closed

More like this.

elasticsearch,opensearch

Subscribe to Monthly Newsletter

Keep up-to-date with important changes in big data technologies; discover new features and tools for your business.

Open topic with navigation

Primary shard allocation and manual rebalancing

You can use the re-route API if a shard becomes stuck on a particular node or you want to manually move one primary shard to a different data node.

For more information see the following site: https://support.elasticsearch.com/requests/6691

This page contains instructions for the following methods:

Invalidating the shard and reassigning its location

Temporarily disabling shard allocation.

This causes the cluster to go to red momentarily. Stop indexing prior to performing this action. In the following code, Node_2 is stores primary shard two. Use the following command to invalidate the shard and reassign its location:

Use the following command to temporarily disable shard allocation:

Canceling replica shards and migrating the primary shard to a new location

In the following code, Node_3 stores the primary shard, while Node_1 and Node_2 store replica shards. Use the following command to cancel the replica shards on Node_1 and Node_2 and migrate the primary shard from Node_3 to Node_2.

Re-enabling shard allocation

Use the following command to re-enable shard allocation once you complete the migration.

elasticsearch reassign shards

Resolving red cluster or UNASSIGNED shards

The Elasticsearch service can remain in the TENTATIVE state when at least one primary shard and all its replicas are missing.

Before you begin

Confirm that the problem is not related to the following issues:

  • A full disk on one or multiple hosts. For more information about not enough disk space, see Resolving reports on full disk or watermark reached
  • Unavailable shards due to hosts in Elasticsearch resource group greater than MaxInstances .

About this task

Do not restart the cluster when you experience the Elasticsearch service remains in the TENTATIVE state with UNASSIGNED shards.

Follow this high-level troubleshooting process to isolate shards with an UNASSIGNED error:

  • If the cluster is in the red state, the Elasticsearch service remains in the TENTATIVE state until all primary shards are active.
  • If the cluster recently restarted, or when the Elasticsearch cluster grows or contracts, the Elasticsearch might be in the process of migrating shards to rebalance the cluster and the value in active_shards_percent_as_number continue to increase as shards become active.
  • Repeat the command after a few minutes and monitor the active_shards_percent_as_number continues to grow. The cluster might need some time to for the shards to become active. The Elasticsearch service goes in the STARTED state when at least one primary or replica shard is active and the active_shards_percent_as_number becomes 50% or greater.

If the active_shards_percent_as_number is stuck and does not continue to grow over a significant passing of time, there might be an issue with a particular index or shard.

  • Before a shard can be used, it goes through the INITIALIZING state. If a shard cannot be assigned, the shard remains in the UNASSIGNED state with a reason code. For a list of these reasons that a primary shard might not be started, see Reasons for unassigned shard .
  • If the primary shard and all replica shards for an index are in the UNASSIGNED state, check the Elasticsearch logs for more details. For more information about the default Elastic Stack log locations, see Elastic Stack troubleshooting .
  • If the primary shard and all replica shards for an index cannot be assigned due to hardware failure or related issues, delete the index. Deleting the index will delete shards and Elasticsearch data for that index.
  • Consider Scaling Elasticsearch replicas to provide redundant copies of data to protect against hard failure for future failures.
  • In a command line, run the following command to delete an index where es_index specifies the Elasticsearch index to delete: curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XDELETE $ es_protocol ://$ es_hostname :$ es_port /$ es_index --tlsv1.2 Repeat the command to delete required indices.

IMAGES

  1. Elasticsearch Shards

    elasticsearch reassign shards

  2. Understanding Sharding in Elasticsearch

    elasticsearch reassign shards

  3. ElasticSearch Shard Placement Control

    elasticsearch reassign shards

  4. elasticsearch introduction

    elasticsearch reassign shards

  5. lucene

    elasticsearch reassign shards

  6. Learn Elastic Search

    elasticsearch reassign shards

VIDEO

  1. General Reassign (Moin U Ahmed)

  2. Elasticsearch part 1 : install manual (tanpa docker)

  3. Assign/Reassign iReady Diagnostic

  4. Elasticsearch part 3 : menghubungkan elasticsearch php client dengan elasticsearch

  5. ELASTICSEARCH

  6. Elasticsearch核心技术与实战

COMMENTS

  1. What Should I Do If I Want to Become a Girl Instead of a Boy?

    To physically become a woman when born a man, a person must go through gender reassignment surgery. However, there is a list of mandatory and optional prerequisites to complete before a doctor can perform the surgery on a patient, notes The...

  2. How Can a Man Become Feminine?

    There are many methods a man can utilize to become more feminine, including waxing body hair, selecting feminine clothes or estrogen hormone treatment, as explained by the National Health Service of England. Permanent methods include trache...

  3. What Is the Right Way to Fix Cracked Glass?

    For cracks in glass used for electronics, such as the screen of a smartphone, the conventional wisdom is that the glass will need to be replaced. This is the best way to bring the technology back to full functionality without risking injury...

  4. How to Resolve Unassigned Shards in Elasticsearch

    In other words, the primary node will not assign a primary shard to the same node as its replica, nor will it assign two replicas of the same

  5. ElasticSearch: Unassigned Shards, how to fix?

    By default, Elasticsearch will re-assign shards to nodes dynamically. However, if you've disabled shard allocation (perhaps you did a

  6. Cluster reroute API

    The reroute command allows for manual changes to the allocation of individual shards in the cluster. For example, a shard can be moved from one node to

  7. Diagnose unassigned shards

    Diagnose unassigned shardsedit · Log in to the Elastic Cloud console. · On the Elasticsearch Service panel, click the name of your deployment. · Open your

  8. How to Find and Fix Elasticsearch Unassigned Shards

    Unassigned – This state occurs when a shard has failed to be assigned. The reason for this may vary, as explained below. Ideally, you'd monitor

  9. Elasticsearch Shards

    Relocating: A state that occurs when shards are in the process of being moved to a different node. This may be necessary under certain

  10. Controlling Shard Rebalance in Elasticsearch and OpenSearch

    Shard rebalancing in Elasticsearch is the redistribution of index shards across nodes within a cluster. it sometimes requires supervision to

  11. How to resolve missing replica shards in Elasticsearch

    If you forgot to re-enable allocation afterward, Elasticsearch will be unable to assign shards. To re-enable allocation, reset the cluster.routing.allocation.

  12. Elasticsearch issue: Unassigned shards because of max_retry

    Elasticsearch's shard allocation system can get complicated. When we create index, or have one of our nodes crashed, shards may go into

  13. Primary shard allocation and manual rebalancing

    Temporarily disabling shard allocation. Invalidating the shard and reassigning its location. This causes the cluster to go to red momentarily. Stop indexing

  14. Resolving red cluster or UNASSIGNED shards

    The Elasticsearch service can remain in the TENTATIVE state when at least one primary shard and all its replicas are missing.