Skip to content

deployment-phase-1 #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e661da4
uncommented deploy section
DanielleWashington Mar 13, 2025
cbd6d79
added k8s poc doc, prod readiness doc, added k8s landing page, and co…
DanielleWashington Mar 16, 2025
72873cb
Add Danielle to the slack updates
sebawita Mar 17, 2025
b016af2
Merge branch 'main' into deployment-phase-1
sebawita Mar 17, 2025
8ee5f9a
fixed link issue
DanielleWashington Mar 17, 2025
1b623c7
updated per Sebastian's review and feedback
DanielleWashington Mar 17, 2025
5994c6d
updated per JP's review and feedback
DanielleWashington Mar 18, 2025
d809573
updating with edits
DanielleWashington Mar 18, 2025
6e32f70
adding front matter to change sidebar label
DanielleWashington Mar 18, 2025
8ca1f81
updating with edits
DanielleWashington Mar 18, 2025
60d4787
removed unsupported byoc integrations
DanielleWashington Mar 18, 2025
0d1dbef
Merge branch 'main' into deployment-phase-1
DanielleWashington Mar 18, 2025
7471adf
added additional questions
DanielleWashington Mar 20, 2025
ee628de
production section added and k8s prod docs moved
DanielleWashington Mar 22, 2025
a02e757
hiding k8s on secondary navbar
DanielleWashington Mar 22, 2025
57db8e0
migrating docker and k8s installation guides, adding cards to deploym…
DanielleWashington Mar 23, 2025
19ed67d
migrating aws and gcp installation guides
DanielleWashington Mar 24, 2025
a72c8e0
adding config guides
DanielleWashington Mar 24, 2025
5085b2f
adding monitoring, persistence and images
DanielleWashington Mar 24, 2025
226e672
Merge branch 'main' into deployment-phase-1
TheCyberMaven Apr 3, 2025
e2c7f1d
adding k8s cli doc
TheCyberMaven Apr 6, 2025
b2d5e3a
adding placeholders
DanielleWashington Apr 7, 2025
703a04e
Merge branch 'main' into deployment-phase-1
DanielleWashington Apr 7, 2025
4a5b877
making code blocks multi-line and removing duplicate on sidebar
TheCyberMaven Apr 10, 2025
077f7ef
adding the async replication env vars doc
TheCyberMaven Apr 11, 2025
895f987
updated per JP's review
TheCyberMaven Apr 14, 2025
e5c59cb
updating broken links
TheCyberMaven Apr 14, 2025
e409901
updated doc with further instructions
TheCyberMaven Apr 17, 2025
814f779
adding and updating docs
TheCyberMaven Apr 27, 2025
3c30e47
updating EKS doc, adding horizontal scaling doc and migrating docs fr…
TheCyberMaven May 9, 2025
72852f7
fixing broken link
TheCyberMaven May 9, 2025
e13a8ef
Merge branch 'main' into deployment-phase-1
TheCyberMaven May 9, 2025
bb6c046
fixing broken links
DanielleWashington May 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _build_scripts/slack-find-author.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ git_slack_map=(
["Charlie Harr"]="<@U044XTHRVFA>"
["Connor Shorten"]="<@U03FRH53SUT>"
["Daniel Madalitso Phiri"]="<@U060UJ41YBC>"
["DanielleWashington"]="<@U088SBVDCET>"
["Dirk Kulawiak"]="<@U03MWHK4KV3>"
["Duda Nogueira"]="<@U05K3K9M82F>"
["dyma solovei"]="<@U07NGR323JR>"
Expand Down
4 changes: 0 additions & 4 deletions docs/deploy/aws/_category_.json

This file was deleted.

52 changes: 0 additions & 52 deletions docs/deploy/aws/index.mdx

This file was deleted.

179 changes: 179 additions & 0 deletions docs/deploy/config-guides/async-rep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---

title: Async Replication

---

Introduced to GA in the 1.29 release, Async Replication is a mechanism used to ensure eventual consistency across nodes in a distributed cluster. It works as a background process that automatically keeps nodes in sync without requiring user queries. Previously, consistency was achieved through "read repair" which involved nodes comparing data during a read request and exchanging missing or outdated information. This approach guarantees eventual consistency without requiring read operations.

:::info

This applies solely to data objects, as metadata consistency is treated differently (through RAFT consensus).
:::

### Under the Hood

- Async replication operates as a background process either per tenant (in a multi-tenant collection) or per shard (in a non-multi-tenant collection).
- It is disabled by default but can be enabled through collection configuration changes, similar to setting the replication factor.

## Environment Variable Deep Dive

These environment variables can be used to fine-tune behavior for your specific use case or deployment environment.

:::tip
The optimal values for these variables will ultimately depend on factors like: data size, network conditions, write patterns, and the desired level of eventual consistency.
:::

## Use Cases

### General

<details>

<summary> Feature Control </summary>
#### `ASYNC_REPLICATION_DISABLED`
Globally disables the entire async replication feature.

- Its default value is `false`.
- **Use case**: This is useful when you have many tenants or collections where a temporary global disable is needed, like during debugging or critical maintenance.
- **Special Considerations**:
- This overrides any collection configuration.

</details>

<details>
<summary>Replication Control </summary>

#### `ASYNC_REPLICATION_PROPAGATION_LIMIT`
Defines the maximum number of objects that will be propagated in a single async replication iteration (after one hash tree comparison).
- By default is set to 10,000.
- **Use Case(s)**: Can be adjusted based on network capacity and the desired rate of convergence.
- **Considerations**: Even if more than this number of differences are detected, only this many objects will be propagated in the current iteration. Subsequent iterations will handle the remaining differences.


#### `ASYNC_REPLICATION_PROPAGATION_DELAY`
Introduces a delay before considering an object for propagation. Only objects older than this delay are considered.
- By default it is set to 30 seconds.
- **Use Case(s)**: If an object is inserted into one node but the insertion is still in progress, the hash comparison might detect it. This delay prevents the async replication from trying to propagate it before the local write operation is fully complete.
- **Considerations**: This should be set based on the typical write latency of the system.
</details>

<details>
<summary> Operational Visibility </summary>

#### `ASYNC_REPLICATION_LOGGING_FREQUENCY`
Controls how often the background async replication process logs its activity.
- By default it is set to 5 seconds.
- **Use Case(s)**: Increasing the frequency provides more detailed logs, while decreasing it reduces log verbosity.
</details>

### Performance Tuning

<details>

<summary> Memory Optimization </summary>

#### `ASYNC_REPLICATION_HASHTREE_HEIGHT`
Customizes the height of the hash tree built by each node to represent its locally stored data.
- By default the value is set to 16 which is roughly 2MB of RAM per shard on each node.
- **Use case(s)**:
- In multi-tenant setups with a large number of tenants, reducing the hash tree would minimize the memory footprint.
- For very large collections, a larger hash tree could be more beneficial for more efficient identification of differing data ranges.
- **Special Considerations**:
- Modification of the hash tree height requires rebuilding the hash tree on each node, which involves iterating over all existing objects.

</details>

<details>

<summary> Throughput and Concurrency </summary>

#### `ASYNC_REPLICATION_PROPAGATION_CONCURRENCY`
Controls the number of concurrent goroutines (or threads) used to send batches of objects during the propagation phase.
- By default it is set to 5.
- **Considerations**: Increasing concurrency can improve propagation speed, but needs to be balanced with potential resource contention (CPU, network).

</details>

<details>

<summary> Batch Processing </summary>

#### `ASYNC_REPLICATION_DIFF_BATCH_SIZE`
Sets the number of object metadata fetched per request during the comparison phase.
- By default it is set to 1000.
- **Use Case(s)**: May be increased to potentially improve performance if network latency is low and nodes can handle larger requests.
- **Considerations**: Fetching metadata in batches optimizes network communication.


#### `ASYNC_REPLICATION_PROPAGATION_BATCH_SIZE`
Sets the maximum number of objects included in each batch when propagating data to a remote node.
- By default is set to 100.
- **Use Case(s)**:
- For large objects, reducing the batch size can help manage memory usage during propagation. The batch size could be similar to the batch size used during initial data insertion.
- For smaller objects, increasing the batch size might improve propagation efficiency by reducing the overhead of individual requests, but needs to be balanced with potential memory pressure.
- **Considerations**: This setting is particularly important for large objects, as larger batches can lead to higher memory consumption during transmission. Multiple batches may be sent within a single iteration to reach the `ASYNC_REPLICATION_PROPAGATION_LIMIT`.

</details>

### Consistency Tuning

<details>

<summary> Synchronization Frequency </summary>
#### `ASYNC_REPLICATION_FREQUENCY`
Defines how often each node initiates the process of comparing its local data (via the hash tree) with other nodes storing the same shard. This regularly checks for inconsistencies, even if no changes have been explicitly triggered.
- It's default value is 30 seconds.
- **Use Case(s)**
- Decreasing the frequency can be beneficial for applications that require faster convergence to eventual consistency.
- Increasing the frequency can be beneficial for reducing the load on the system by relaxing the eventual consistency.

#### `ASYNC_REPLICATION_FREQUENCY_WHILE_PROPAGATING`
Defines a shorter frequency for subsequent comparison and propagation attempts when a previous propagation cycle did not complete (i.e., not all detected differences were synchronized).
- By default it is set to 20 milliseconds.
- **Use Case(s)**: When inconsistencies are known to exist, this expedites the synchronization process.
- **Considerations**: This is activated after a propagation cycle detects differences but does not propagate all of them due to limits.

</details>

<details>
<summary> Node Status Monitoring </summary>

#### `ASYNC_REPLICATION_ALIVE_NODES_CHECKING_FREQUENCY`
Defines the frequency at which the system checks for changes in the availability of nodes within the cluster.
- By default it is set to 5 seconds.
- **Use Case(s)**: When a node rejoins the cluster after a period of downtime, it is highly likely to be out of sync. This setting ensures that the replication process is initiated promptly.

</details>

<details>
<summary>Timeout Management </summary>

#### `ASYNC_REPLICATION_DIFF_PER_NODE_TIMEOUT`
Defines the maximum time to wait for a response when requesting object metadata from a remote node during the comparison phase, this prevents indefinite blocking if a node is unresponsive.
- By default is set to 10 seconds.
- **Use Case(s)**: May need to be increased in environments with high network latency or potentially slow-responding nodes.

#### `ASYNC_REPLICATION_PROPAGATION_TIMEOUT`
Sets the maximum time allowed for a single propagation request (sending actual object data) to a remote node.
- By default is set to 30 seconds.
- **Use Case(s)**: May need to be increased in scenarios with high network latency, large object sizes (e.g., images, vectors), or when sending large batches of objects.
- **Considerations**: Network latency, batch size, and the size of the objects being propagated can all affect timeouts.

</details>



### Further Resources

[Concepts: Replication](https://weaviate.io/developers/weaviate/concepts/replication-architecture/consistency)

[Replication How-To](https://weaviate.io/developers/weaviate/configuration/replication#async-replication-settings)

[Environment Variables](https://weaviate.io/developers/weaviate/config-refs/env-vars#async-replication)

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Loading
Loading