Fix DiskThresholdDecider average disk usage with huge filesystems #100599

tlrx · 2023-10-10T13:19:52Z

DiskThresholdDecider may compute an average disk usage if the real disk usage of a node is unknown. This average disk usage computes the average total disk space and free disk space among all nodes, but that can break if nodes are using huge filesystems or if the sum of total/free bytes exceeds Long.MAX_VALUE (we've seen total bytes being negative in some clusters).

This change introduces an early check in DiskThresholdDecider to detect such situation and treat it as if nodes disk usages were unknown, returning a YES decision for allocation. This is the less impactful solution I can find for this bug but I'm open to other suggestions.

elasticsearchmachine · 2023-10-10T13:20:17Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2023-10-10T13:20:18Z

Hi @tlrx, I've created a changelog YAML for you.

idegtiarenko · 2023-10-19T08:47:30Z

...src/main/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDecider.java

@@ -393,6 +399,9 @@ public Decision canRemain(IndexMetadata indexMetadata, ShardRouting shardRouting
        if (indexMetadata.ignoreDiskWatermarks()) {
            return YES_DISK_WATERMARKS_IGNORED;
        }
+        if (useAverageDiskUsage(node, usages) == false) {
+            return YES_AVERAGE_DISK_USAGE_UNAVAILABLE;
+        }


This is added before every call getDiskUsage that might compute the average.
Is there a value of making getDiskUsage return null if overflow happens and have it as a signal to return above decision? This way the addition of total and free could be performed only once (in getDiskUsage) vs twice (in getDiskUsage and the check above).

idegtiarenko · 2023-10-19T08:49:04Z

...java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java

+        );
+        var decider = new DiskThresholdDecider(
+            Settings.EMPTY,
+            new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)


Nit:

Suggested change

new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)

ClusterSettings.createBuiltInClusterSettings()

idegtiarenko · 2023-10-19T08:51:06Z

...java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java

+                    .add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
+                    .add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
+                    .add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())


NIT:

Suggested change

.add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())

.add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())

.add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())

.add(DiscoveryNodeUtils.create("node_0"))

.add(DiscoveryNodeUtils.create("node_1"))

.add(DiscoveryNodeUtils.create("node_2"))

I believe all roles are added by default so no need to manually set them

elasticsearchmachine · 2025-01-30T16:58:36Z

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

elasticsearchmachine · 2025-01-30T16:58:36Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Fix DiskThresholdDecider average disk usage with huge filesystems

5aea376

tlrx added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.11.1 v8.12.0 labels Oct 10, 2023

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 10, 2023

Update docs/changelog/100599.yaml

99a4100

tlrx requested a review from DaveCTurner October 10, 2023 14:18

elastic deleted a comment from greicefaustino Oct 18, 2023

idegtiarenko reviewed Oct 19, 2023

View reviewed changes

bpintea added v8.11.2 and removed v8.11.1 labels Nov 13, 2023

brianseeders added v8.11.3 v8.13.0 v8.11.4 and removed v8.11.2 v8.12.0 v8.11.3 labels Dec 6, 2023

brianseeders added v8.11.5 and removed v8.11.4 labels Jan 11, 2024

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

elasticsearchmachine added v9.1.0 Team:Distributed Coordination Meta label for Distributed Coordination team and removed v9.0.0 labels Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DiskThresholdDecider average disk usage with huge filesystems #100599

Fix DiskThresholdDecider average disk usage with huge filesystems #100599

tlrx commented Oct 10, 2023

elasticsearchmachine commented Oct 10, 2023

elasticsearchmachine commented Oct 10, 2023

idegtiarenko Oct 19, 2023

idegtiarenko Oct 19, 2023

idegtiarenko Oct 19, 2023

elasticsearchmachine commented Jan 30, 2025

elasticsearchmachine commented Jan 30, 2025

	new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)
	ClusterSettings.createBuiltInClusterSettings()

Fix DiskThresholdDecider average disk usage with huge filesystems #100599

Are you sure you want to change the base?

Fix DiskThresholdDecider average disk usage with huge filesystems #100599

Conversation

tlrx commented Oct 10, 2023

elasticsearchmachine commented Oct 10, 2023

elasticsearchmachine commented Oct 10, 2023

idegtiarenko Oct 19, 2023

Choose a reason for hiding this comment

idegtiarenko Oct 19, 2023

Choose a reason for hiding this comment

idegtiarenko Oct 19, 2023

Choose a reason for hiding this comment

elasticsearchmachine commented Jan 30, 2025

elasticsearchmachine commented Jan 30, 2025