Skip to content

Fix DiskThresholdDecider average disk usage with huge filesystems #100599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Oct 10, 2023

DiskThresholdDecider may compute an average disk usage if the real disk usage of a node is unknown. This average disk usage computes the average total disk space and free disk space among all nodes, but that can break if nodes are using huge filesystems or if the sum of total/free bytes exceeds Long.MAX_VALUE (we've seen total bytes being negative in some clusters).

This change introduces an early check in DiskThresholdDecider to detect such situation and treat it as if nodes disk usages were unknown, returning a YES decision for allocation. This is the less impactful solution I can find for this bug but I'm open to other suggestions.

@tlrx tlrx added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.11.1 v8.12.0 labels Oct 10, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx requested a review from DaveCTurner October 10, 2023 14:18
@elastic elastic deleted a comment from greicefaustino Oct 18, 2023
@@ -393,6 +399,9 @@ public Decision canRemain(IndexMetadata indexMetadata, ShardRouting shardRouting
if (indexMetadata.ignoreDiskWatermarks()) {
return YES_DISK_WATERMARKS_IGNORED;
}
if (useAverageDiskUsage(node, usages) == false) {
return YES_AVERAGE_DISK_USAGE_UNAVAILABLE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added before every call getDiskUsage that might compute the average.
Is there a value of making getDiskUsage return null if overflow happens and have it as a signal to return above decision? This way the addition of total and free could be performed only once (in getDiskUsage) vs twice (in getDiskUsage and the check above).

);
var decider = new DiskThresholdDecider(
Settings.EMPTY,
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)
ClusterSettings.createBuiltInClusterSettings()

Comment on lines +141 to +143
.add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
.add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
.add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
.add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
.add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
.add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build())
.add(DiscoveryNodeUtils.create("node_0"))
.add(DiscoveryNodeUtils.create("node_1"))
.add(DiscoveryNodeUtils.create("node_2"))

I believe all roles are added by default so no need to manually set them

@elasticsearchmachine elasticsearchmachine added v9.1.0 Team:Distributed Coordination Meta label for Distributed Coordination team and removed v9.0.0 labels Jan 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.11.5 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants