[DOCS-10646] HA Agent #28928

New issue

Jump to bottom

Open

aliciascott wants to merge 12 commits into master from aliciascott/DOCS-10646-HA-Agent

+136 −6

Contributor

aliciascott commented Apr 23, 2025

What does this PR do? What is the motivation?

Merge instructions

Merge readiness:

Ready for merge

For Datadog employees:
Merge queue is enabled in this repo. Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass in CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

To have your PR automatically merged after it receives the required reviews, add the following PR comment:

/merge

Additional notes


          initial commit HA Agent setup

ebde607

aliciascott added the WORK IN PROGRESS label

aliciascott requested a review from a team as a code owner

April 23, 2025 17:05

github-actions bot added the Architecture label

Contributor

github-actions bot commented Apr 23, 2025 •

edited

Loading

Preview links (active after the `build_preview` check completes)

New or renamed files

https://docs-staging.datadoghq.com/aliciascott/DOCS-10646-HA-Agent/integrations/guide/high_availability

Modified Files


          Draft phase 2

3cd36da

github-actions bot added the Images label

aliciascott and others added 3 commits

April 24, 2025 09:51


          noted future agent version for some integrations

55e3699


          Merge branch 'master' into aliciascott/DOCS-10646-HA-Agent

18e894f


          moving doc to integrations-guides

97d3cbc

github-actions bot added Guide and removed Architecture labels

aliciascott and others added 2 commits

May 1, 2025 15:41


          moving to integrations guides

07f5bb2


          Merge branch 'master' into aliciascott/DOCS-10646-HA-Agent

777073f

aliciascott changed the title ~~[DOCS-10646] initial commit HA Agent setup~~ [DOCS-10646] HA Agent

aliciascott added 3 commits

May 2, 2025 15:43


          fixing screenshots


          small wording changes

292dfef


          remove NDM reference

7c0fead

aliciascott added the okr11 label

aliciascott added 2 commits

May 13, 2025 13:43


          small fixes

b53fce1


          update further reading

439bd24

estherk15 reviewed

View reviewed changes

Contributor

estherk15 left a comment

Looks great! Noticed a few things:

Recommend consistency when referencing the preferred active Agent. (Preferred active Agent, preferred active Agent, Preferred Active Agent)
Left a suggestion to remove nested bullets, but if it changes the intended message, feel free to ignore.

content/en/integrations/guide/high_availability.md


		### Installation

		1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings).

Contributor

estherk15 May 16, 2025

Suggested change

      
            1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings).
          
            1. Install the Datadog Agent on two similar hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings).

Contributor

estherk15 May 16, 2025

For clarity, it's one Agent on one host right?

content/en/integrations/guide/high_availability.md


		1. Install two Agents on like hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings).

		2. For both Agents, on each host, configure your `datadog.yaml` with the following settings:

Contributor

estherk15 May 16, 2025

Suggested change

      
            2. For both Agents, on each host, configure your `datadog.yaml` with the following settings:
          
            2. Configure your `datadog.yaml` on each host, with the following settings:

content/en/integrations/guide/high_availability.md

Comment on lines +62 to +63

		For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide.
		Note: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration.

Contributor

estherk15 May 16, 2025

Not sure if you meant for this to be on a new line:

Suggested change

      
               For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide.
          
               **Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration.
          
               For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide. <br>
          
               **Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration.

content/en/integrations/guide/high_availability.md

+                 For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics][1] setup guide.
+                 **Note**: Both [individual device monitoring][10] and [Autodiscovery][11] methods are supported for the SNMP integration.
+                 After configured, the two Agents function as an HA pair:

Contributor

estherk15 May 16, 2025

Suggested change

      
               After configured, the two Agents function as an HA pair:
          
               After the Agents are configured, they function as an HA pair:

content/en/integrations/guide/high_availability.md


		2. Search for your previously configured Agents using tags or hostname, for example, `config_id:<CONFIG-NAME>`.

		{{< img src="/integrations/guide/high_availability/fleet-view-agents.png" alt="Fleet Automation View Agents" style="width:100%;" >}}

Contributor

estherk15 May 16, 2025

Recommend aligning images under their numbered list. If possible, remove Preview labels with pawparazzi

content/en/integrations/guide/high_availability.md

Comment on lines +85 to +86

		1. Test that failover works by shutting down the Agent or host that is Active.
		2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes.

Contributor

estherk15 May 16, 2025

Suggested change

      
            1. Test that failover works by shutting down the Agent or host that is Active.
          
            2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes.
          
            1. Test failover by shutting down the active Agent or its host.
          
            2. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes.

content/en/integrations/guide/high_availability.md

Comment on lines +92 to +103

+              **If no Preferred active Agent is defined**:
+              - The active Agent is initially chosen randomly.
+              - Active Agent switching is minimized to avoid unnecessary failover:
+                - If the primary Agent is active and it shuts down or crashes, the secondary Agent takes over as the new active Agent.
+                - When the primary Agent recovers, the secondary Agent remains active.
+              **If a Preferred active Agent is defined**:
+              - The preferred active Agent takes priority:
+                - If the primary Agent is the preferred active Agent and is active, a failover occurs if the primary Agent shuts down or crashes, making the secondary Agent active.
+                - When the primary Agent recovers, it automatically resumes the active role, and the secondary Agent returns to standby.

Contributor

estherk15 May 16, 2025

I tend towards one level of bulleted lists, but hopefully this still conveys the same points.

Suggested change

      
            **If no Preferred active Agent is defined**:
          
            - The active Agent is initially chosen randomly.
          
            - Active Agent switching is minimized to avoid unnecessary failover:
          
              - If the primary Agent is active and it shuts down or crashes, the secondary Agent takes over as the new active Agent.
          
              - When the primary Agent recovers, the secondary Agent remains active.
          
            **If a Preferred active Agent is defined**:
          
            - The preferred active Agent takes priority:
          
              - If the primary Agent is the preferred active Agent and is active, a failover occurs if the primary Agent shuts down or crashes, making the secondary Agent active.
          
              - When the primary Agent recovers, it automatically resumes the active role, and the secondary Agent returns to standby.
          
            ** Without a preferred active Agent
          
            - The active Agent is initially selected at random.
          
            - Failover occurs only when the current active Agent shuts down or crashes.
          
            - When the primary Agent recovers, it does not automatically reclaim the active role.
          
            **With a preferred active Agent
          
            - The preferred Agent always takes priority when available.
          
            - If it fails, the standby Agent becomes active.
          
            - When the preferred Agent recovers, it automatically resumes the active role, and the standby Agent returns to standby.

content/en/integrations/guide/high_availability.md


		### Why does my Agent have an `unknown` HA Agent state?

		- Remote Configuration may not be setup correctly. Review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation for more information.

Contributor

estherk15 May 16, 2025

Suggested change

      
            - Remote Configuration may not be setup correctly. Review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation for more information.
          
            - Remote Configuration may not be setup correctly. For more information, review the [prerequisites](#prerequisites) and [Remote Configuration setup][12] documentation.

content/en/network_monitoring/devices/setup.md

               {{< img src="network_device_monitoring/getting_started/ndm_install_agent.png" alt="The Agent configuration page, highlighting the Ubuntu installation." style="width:100%;" >}}
+              ## Setup
               #### High Availability

Contributor

estherk15 May 16, 2025

Why does this section jump from H2 to H4?

content/en/network_monitoring/devices/setup.md

               You can configure active and standby Agents to function as an HA pair in NDM. If the active Agent goes down, the standby Agent takes over within 90 seconds, becoming the new active Agent. Additionally, you can designate a preferred active Agent, allowing NDM to automatically revert to it once it becomes available again. This feature allows for proactive Agent switching ahead of scheduled maintenance.
-              ## Setup
+              Reference [High Availability support of the Datadog Agent][20] for more information.

Contributor

estherk15 May 16, 2025

Suggested change

      
            Reference [High Availability support of the Datadog Agent][20] for more information.
          
            For more information, see [High Availability support of the Datadog Agent][20].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guide Images okr11 WORK IN PROGRESS