-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[DOCS-10646] HA Agent #28928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[DOCS-10646] HA Agent #28928
Conversation
Preview links (active after the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Noticed a few things:
- Recommend consistency when referencing the preferred active Agent. (Preferred active Agent, preferred active Agent, Preferred Active Agent)
- Left a suggestion to remove nested bullets, but if it changes the intended message, feel free to ignore.
|
||
### Installation | ||
|
||
1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings). | |
1. Install the Datadog Agent on two similar hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, it's one Agent on one host right?
|
||
1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings). | ||
|
||
2. For both Agents, on each host, configure your `datadog.yaml` with the following settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. For both Agents, on each host, configure your `datadog.yaml` with the following settings: | |
2. Configure your `datadog.yaml` on each host, with the following settings: |
For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide. | ||
**Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you meant for this to be on a new line:
For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide. | |
**Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration. | |
For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide. <br> | |
**Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration. |
For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide. | ||
**Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration. | ||
|
||
After configured, the two Agents function as an HA pair: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After configured, the two Agents function as an HA pair: | |
After the Agents are configured, they function as an HA pair: |
|
||
2. Search for your previously configured Agents using tags or hostname, for example, `config_id:<CONFIG-NAME>`. | ||
|
||
{{< img src="/integrations/guide/high_availability/fleet-view-agents.png" alt="Fleet Automation View Agents" style="width:100%;" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend aligning images under their numbered list. If possible, remove Preview labels with pawparazzi
1. Test that failover works by shutting down the Agent or host that is Active. | ||
2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Test that failover works by shutting down the Agent or host that is Active. | |
2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes. | |
1. Test failover by shutting down the active Agent or its host. | |
2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes. |
**If no Preferred active Agent is defined**: | ||
|
||
- The active Agent is initially chosen randomly. | ||
- Active Agent switching is minimized to avoid unnecessary failover: | ||
- If the primary Agent is active and it shuts down or crashes, the secondary Agent takes over as the new active Agent. | ||
- When the primary Agent recovers, the secondary Agent remains active. | ||
|
||
**If a Preferred active Agent is defined**: | ||
|
||
- The preferred active Agent takes priority: | ||
- If the primary Agent is the preferred active Agent and is active, a failover occurs if the primary Agent shuts down or crashes, making the secondary Agent active. | ||
- When the primary Agent recovers, it automatically resumes the active role, and the secondary Agent returns to standby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend towards one level of bulleted lists, but hopefully this still conveys the same points.
**If no Preferred active Agent is defined**: | |
- The active Agent is initially chosen randomly. | |
- Active Agent switching is minimized to avoid unnecessary failover: | |
- If the primary Agent is active and it shuts down or crashes, the secondary Agent takes over as the new active Agent. | |
- When the primary Agent recovers, the secondary Agent remains active. | |
**If a Preferred active Agent is defined**: | |
- The preferred active Agent takes priority: | |
- If the primary Agent is the preferred active Agent and is active, a failover occurs if the primary Agent shuts down or crashes, making the secondary Agent active. | |
- When the primary Agent recovers, it automatically resumes the active role, and the secondary Agent returns to standby. | |
** Without a preferred active Agent | |
- The active Agent is initially selected at random. | |
- Failover occurs only when the current active Agent shuts down or crashes. | |
- When the primary Agent recovers, it does not automatically reclaim the active role. | |
**With a preferred active Agent | |
- The preferred Agent always takes priority when available. | |
- If it fails, the standby Agent becomes active. | |
- When the preferred Agent recovers, it automatically resumes the active role, and the standby Agent returns to standby. |
|
||
### Why does my Agent have an `unknown` HA Agent state? | ||
|
||
- Remote Configuration may not be setup correctly. Review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remote Configuration may not be setup correctly. Review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation for more information. | |
- Remote Configuration may not be setup correctly. For more information, review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation. |
@@ -41,20 +41,22 @@ Navigate to the [Agent installation page][1], and install the [Datadog Agent][2] | |||
|
|||
{{< img src="network_device_monitoring/getting_started/ndm_install_agent.png" alt="The Agent configuration page, highlighting the Ubuntu installation." style="width:100%;" >}} | |||
|
|||
## Setup | |||
|
|||
#### High Availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this section jump from H2 to H4?
|
||
You can configure active and standby Agents to function as an HA pair in NDM. If the active Agent goes down, the standby Agent takes over within 90 seconds, becoming the new active Agent. Additionally, you can designate a preferred active Agent, allowing NDM to automatically revert to it once it becomes available again. This feature allows for proactive Agent switching ahead of scheduled maintenance. | ||
|
||
## Setup | ||
Reference [High Availability support of the Datadog Agent][20] for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference [High Availability support of the Datadog Agent][20] for more information. | |
For more information, see [High Availability support of the Datadog Agent][20]. |
What does this PR do? What is the motivation?
Merge instructions
Merge readiness:
For Datadog employees:
Merge queue is enabled in this repo. Your branch name MUST follow the
<name>/<description>
convention and include the forward slash (/
). Without this format, your pull request will not pass in CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
To have your PR automatically merged after it receives the required reviews, add the following PR comment:
Additional notes