-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix DiskThresholdDecider average disk usage with huge filesystems #100599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix DiskThresholdDecider average disk usage with huge filesystems #100599
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @tlrx, I've created a changelog YAML for you. |
@@ -393,6 +399,9 @@ public Decision canRemain(IndexMetadata indexMetadata, ShardRouting shardRouting | |||
if (indexMetadata.ignoreDiskWatermarks()) { | |||
return YES_DISK_WATERMARKS_IGNORED; | |||
} | |||
if (useAverageDiskUsage(node, usages) == false) { | |||
return YES_AVERAGE_DISK_USAGE_UNAVAILABLE; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added before every call getDiskUsage
that might compute the average.
Is there a value of making getDiskUsage
return null if overflow happens and have it as a signal to return above decision? This way the addition of total and free could be performed only once (in getDiskUsage
) vs twice (in getDiskUsage
and the check above).
); | ||
var decider = new DiskThresholdDecider( | ||
Settings.EMPTY, | ||
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS) | |
ClusterSettings.createBuiltInClusterSettings() |
.add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) | ||
.add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) | ||
.add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
.add(DiscoveryNodeUtils.builder("node_0").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) | |
.add(DiscoveryNodeUtils.builder("node_1").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) | |
.add(DiscoveryNodeUtils.builder("node_2").roles(new HashSet<>(DiscoveryNodeRole.roles())).build()) | |
.add(DiscoveryNodeUtils.create("node_0")) | |
.add(DiscoveryNodeUtils.create("node_1")) | |
.add(DiscoveryNodeUtils.create("node_2")) |
I believe all roles are added by default so no need to manually set them
Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete)) |
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DiskThresholdDecider
may compute an average disk usage if the real disk usage of a node is unknown. This average disk usage computes the average total disk space and free disk space among all nodes, but that can break if nodes are using huge filesystems or if the sum of total/free bytes exceeds Long.MAX_VALUE (we've seen total bytes being negative in some clusters).This change introduces an early check in
DiskThresholdDecider
to detect such situation and treat it as if nodes disk usages were unknown, returning aYES
decision for allocation. This is the less impactful solution I can find for this bug but I'm open to other suggestions.