Skip to content

Doc-126 Revise Offset Commit topic #1080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 1, 2025
Merged

Doc-126 Revise Offset Commit topic #1080

merged 16 commits into from
May 1, 2025

Conversation

Feediver1
Copy link
Contributor

@Feediver1 Feediver1 commented Apr 16, 2025

Description

Resolves https://redpandadata.atlassian.net/browse/DOC-126
Review deadline: Friday, April 18

Page previews

Consumer Offsets (self-managed)
Consumer Offsets (cloud)
This file is single sourced between self-managed and cloud. SMEs: please identify any content that is not true for Cloud. thx

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

Summary by CodeRabbit

  • Documentation
    • Expanded and reorganized the guide on consumer offsets, providing a detailed overview, explanations of offset commit strategies, and best practices for managing offsets in Redpanda.
    • Added sections covering offset fundamentals, commit strategies (auto, manual, external, hybrid), and operational recommendations for optimal performance.

@Feediver1 Feediver1 requested a review from a team as a code owner April 16, 2025 19:43
Copy link

netlify bot commented Apr 16, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 17e23e5
🔍 Latest deploy log https://app.netlify.com/sites/redpanda-docs-preview/deploys/6813bafa0bd6b1000837c34a
😎 Deploy Preview https://deploy-preview-1080--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Feediver1 Feediver1 requested a review from pmw-rp April 16, 2025 19:45
Feediver1 and others added 2 commits April 17, 2025 08:26
Co-authored-by: Michele Cyran <michele@redpanda.com>
@Feediver1 Feediver1 requested a review from c4milo April 17, 2025 15:34
Copy link
Member

@c4milo c4milo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this all should apply to Redpanda Cloud as well 👍🏻

Co-authored-by: Michele Cyran <michele@redpanda.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (12)
modules/develop/pages/consume-data/consumer-offsets.adoc (12)

9-10: Refine phrasing for clarity.
The current sentence “it would be likely (though not guaranteed)” feels indirect. Consider a more direct phrasing:

- For example, if a topic has five partitions, it would be likely (though not guaranteed) that each partition holds about 20% of the messages.
+ For example, if a topic has five partitions, it is likely (though not guaranteed) that each partition holds about 20% of the messages.

11-12: Remove redundant wording and improve readability.

  • Change the parenthetical to a comma-delimited clause.
  • Avoid “specify a specific”.
- Within a partition, each message (once accepted and acknowledged by the partition leader) is permanently assigned a unique sequence number called an glossterm:offset[]. Once assigned, offsets are immutable, ensuring that the order of messages within a partition is preserved. You can manually specify a specific start value for offsets if needed.
+ Within a partition, once a message is accepted and acknowledged by the partition leader, it is permanently assigned a unique sequence number called a glossterm:offset[]. Once assigned, offsets are immutable, preserving message order. You can manually specify a start value for offsets when needed (for example, using the consumer `seek()` API).

13-16: Split and clarify the commit overview.
The single long sentence combines multiple ideas. Breaking it into two improves readability and reinforces the glossterm definitions:

- As a consumer reads messages from Redpanda, it can save its progress by “committing the offset” (known as an glossterm:offset commit[]), an action initiated by the consumer, not Redpanda. Kafka client libraries provide an API for committing offsets, which communicates with Redpanda using the glossterm:consumer group[] API. Each committed offset is stored as a message in the `pass:[__consumer_offsets]` topic, which is a private Redpanda topic that stores committed offsets from each Kafka consumer attached to Redpanda, allowing the consumer to resume processing from the last committed point.
+ As a consumer reads messages from Redpanda, it can save its progress by “committing the offset”—an action initiated by the consumer and referred to as an glossterm:offset commit[].  
+ Kafka client libraries expose the commit API (via the glossterm:consumer group[] interface). Each committed offset is recorded as a message in the compacted `pass:[__consumer_offsets]` topic, enabling the consumer to resume processing from its last committed point.

17-18: Tag and standardize the “group coordinator” term.
Wrap the group coordinator in a glossterm and remove quotes for consistency:

- Kafka consumer tracks the maximum offset it has consumed in each partition ... known as the "group coordinator". All consumers in the group send their offset commits and fetch requests to this group coordinator.
+ The Kafka consumer tracks the maximum offset consumed in each partition and can commit offsets to resume processing after a restart. Offsets for a consumer group are stored on a designated broker, referred to as the glossterm:group coordinator[]. All consumers in the group send their offset commits and fetch requests to this coordinator.

31-36: Standardize subheading and clarify default behavior.
Use title case for consistency and clarify that auto-commit is enabled by default:

- === Auto commit
+ === Auto Commit

- Auto commit is the default commit strategy, where the client automatically commits offsets at regular intervals. You can enable this by setting `enable.auto.commit` to `true`.
+ Auto Commit is the default strategy (enabled by default via `enable.auto.commit=true`), where the client commits offsets at regular intervals. To disable it, set the property to `false`.

37-40: Consistent subheading capitalization.
Align with title case:

- === Manual offset commit
+ === Manual Offset Commit

41-44: Title case for nested headings.
Apply title case to improve scanability:

- ==== Synchronous commit
+ ==== Synchronous Commit

59-62: Title case for nested headings.
Apply title case:

- ==== Asynchronous commit
+ ==== Asynchronous Commit

81-83: Consistent heading style.
Use title case:

- === External offset management
+ === External Offset Management

95-97: Consistent heading style.
Use title case:

- === Hybrid offset management
+ === Hybrid Offset Management

108-113: Merge redundant sentences in best practices.
The two consecutive paragraphs about over-committing overlap. Consider combining for brevity:

- However, committing too frequently can result in adverse consequences. While individually small, each commit still results in a message being written to the `pass:[__consumer_offsets]` topic, because the position of the consumer against every partition must be recorded. At high commit rates, this workload can become a bottleneck for both the client and the server.
- In many Kafka client implementations, offset commits aren't coalesced at the client; so if a backlog of commits forms (when using the asynchronous commit API), the earlier commits still need to be processed, even though they are effectively redundant.
+ However, each commit—while small—writes a message to the `pass:[__consumer_offsets]` topic, recording the consumer’s position for every partition. At high commit rates, this can become a bottleneck for both client and server. Additionally, many Kafka client implementations do not coalesce commits, so even redundant commits must be processed.

122-126: Clarify and title-case tuning guidance.
Refine the heading and make explicit which settings to tune:

- === Tune the consumer group
+ === Tune Consumer Group Settings

- In highly parallel applications, frequent consumer group heartbeats can create unnecessary overhead. For example, 3,200 consumers checking every 500 milliseconds generate 6,400 heartbeats per second. You can optimize this behavior by increasing the `heartbeat.interval.ms` (along with `session.timeout.ms`).
+ In highly parallel applications, frequent heartbeats can generate significant overhead (e.g., 3,200 consumers at 500 ms intervals produce 6,400 heartbeats per second). Optimize by tuning `heartbeat.interval.ms` and `session.timeout.ms` together.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e611c5f and 1bad0d8.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
🔇 Additional comments (3)
modules/develop/pages/consume-data/consumer-offsets.adoc (3)

21-24: Verify external link and footnote syntax.
The compaction link uses a trailing caret ([compacted^]) which may render as a footnote marker. Confirm that this is intentional and points to the correct Kafka documentation section.


27-29: Skip review: Commit strategies intro.
This heading and introductory sentence are clear and follow existing conventions.


116-121: Skip review: Unique consumer groups best practice.
This advice is clear, correctly marked as a best practice, and aligns with past guidance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
modules/develop/pages/consume-data/consumer-offsets.adoc (6)

7-24: Enhance clarity and terminology in the Overview section
The new Overview section is a solid addition, but here are a few refinement suggestions:

  1. Avoid redundant wording (e.g., “manually specify a specific start value” → “manually specify a start value”).
  2. Align terminology with the page description—use “internal topic” instead of “private Redpanda topic.”
  3. Consider using AsciiDoc’s link: macro for the external tutorial URL to improve readability:
    link:https://www.redpanda.com/guides/kafka-tutorial-kafka-partition-strategy[partition strategy].
  4. For very long paragraphs, consider breaking them into shorter sentences or bullet points to aid scanning.

27-36: Refine the “Commit strategies” introduction and auto‑commit details
Overall the Commit strategies section reads well. A couple of optional improvements:

  • You might mention the default value of auto.commit.interval.ms (e.g. 5 000 ms) to give readers immediate context.
  • Consider linking enable.auto.commit and auto.commit.interval.ms to the configuration reference in our docs for quick navigation.

37-62: Streamline Manual and Asynchronous commit subsections
The Manual offset commit section is comprehensive—great job adding both sync and async variants. A few optional tweaks:

  • In the async subsection, note that unlike commitSync, commitAsync does not retry on failure, so callers may need to handle retries in their callback.
  • You could show a more realistic callback signature with parameters (e.g., (Map<TopicPartition, OffsetAndMetadata> offsets, Exception e)) to mirror the Kafka API.
  • Ensure code examples include the required imports at top (import org.apache.kafka.clients.consumer.ConsumerRecord;, etc.) for copy‑paste use.

81-93: Unify list styling in External offset management
The external offset management bullets are clear, but the mix of hyphenated and “.”‑style lists can be confusing. Consider adopting a single list style, for example: . enable.auto.commit = false .assign(Collection) . Save offsets externally using the record metadata . On restart, callseek(TopicPartition, offset)`
This will improve visual consistency.


95-101: Clarify Hybrid offset management bullets
The hybrid approach is an important pattern. Two small suggestions:

  • Use the same bullet marker style as the previous section (dots or hyphens).
  • Expand the second bullet to clarify how offsets get committed in hybrid mode—e.g. “Offsets are still committed to Redpanda via the consumer‑group API.”

102-127: Polish the Best Practices section
The new best practices are on point. For extra polish:

  • Format each practice as a short heading + description (e.g., ==== Avoid over‑committing → heading, then one or two bullet points).
  • In the over‑committing section, consider linking to a monitoring guide for commit‑latency metrics.
  • Ensure “Use unique consumer groups” and “Tune the consumer group” examples reference the exact config keys (group.id, heartbeat.interval.ms, session.timeout.ms).
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bad0d8 and f2d2668.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview

Co-authored-by: Michele Cyran <michele@redpanda.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
modules/develop/pages/consume-data/consumer-offsets.adoc (5)

7-7: Review overview heading consistency
Most Redpanda docs jump straight into the intro without an explicit “Overview” heading. Consider removing or aligning this heading with the pattern used in other topics to maintain consistency.


9-9: Inconsistent hyphenation: Round Robin vs Round‑robin
On line 9 you refer to the “Round Robin strategy,” but elsewhere (and in other Redpanda docs) it’s hyphenated as “Round‑robin strategy.” Please unify the hyphenation.


11-11: Incorrect indefinite article before glossterm
Change “an glossterm:offset[]” to “a glossterm:offset[]” for proper English usage.


11-12: Avoid redundant phrasing
“You can manually specify a specific start value” repeats “manually/specific.” Consider simplifying, e.g.:
You can specify a start value for offsets if needed.


33-34: Clarify default for enable.auto.commit
Instead of “You can enable this by setting enable.auto.commit to true,” consider noting that it’s already true by default (e.g., “(default: true)”) so users know if they must override it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2d2668 and 9b4863b.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (2)
modules/develop/pages/consume-data/consumer-offsets.adoc (2)

27-35: Verify commit strategies applicability to cloud
The new “Commit strategies” section outlines auto, manual, external, and hybrid approaches. Please confirm with the Redpanda Cloud team that each strategy (and its property names/defaults) applies unmodified in the cloud environment as well.


106-110: Confirm best practices for cloud deployments
In the “Avoid over-committing” section you discuss RPO and commit latency monitoring. Please verify that these recommendations hold true in Redpanda Cloud as well (e.g., commit throughput, monitoring tools, default retention/compaction settings).

@Feediver1 Feediver1 requested a review from micheleRP April 23, 2025 19:51
@Feediver1
Copy link
Contributor Author

Also got the approval from @pmw-rp via Slack.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modules/develop/pages/consume-data/consumer-offsets.adoc (1)

9-9: Typographic consistency: hyphenate “Round‑robin”
To align with common usage and earlier references to the round‑robin partition strategy, consider changing “Round Robin” to “Round‑robin.”

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b4863b and c1ac3e8.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (9)
modules/develop/pages/consume-data/consumer-offsets.adoc (9)

7-7: Overview section framing looks good
Introducing an explicit “Overview” heading helps readers orient themselves before diving into details.


11-11: Clear definition of offsets within a partition
The explanation of immutable, sequential offsets is concise and accurate.


13-13: Accurate description of offset commits via consumer group API
Good use of glossterm macros for “offset commit” and “consumer group” to highlight key concepts.


15-15: Consumer group partition assignment explained clearly
The example using two consumers and a rebalance scenario is illustrative and follows our number‑writing conventions.


17-17: Coordinator role and offset storage summary is accurate
This sentence succinctly captures how Kafka brokers manage committed offsets via the group coordinator.


19-19: Advanced consumer note is useful
Including the note on bypassing consumer groups for specialized stream‑processing frameworks adds valuable context.


24-24: Retention and compaction behavior clearly stated
Good callout on periodic compaction of the __consumer_offsets topic to keep it lean.


27-27: “Commit strategies” section header is well placed
Introducing a dedicated section for commit strategies improves navigability and structure.


29-29: Introduction to commit strategies is concise
The lead‑in sentence clearly sets up the subsequent subsections.

Copy link
Contributor

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blockers, but please see my suggestions!

Feediver1 and others added 3 commits April 29, 2025 09:27
Co-authored-by: Michele Cyran <michele@redpanda.com>
Co-authored-by: Michele Cyran <michele@redpanda.com>
Co-authored-by: Michele Cyran <michele@redpanda.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
modules/develop/pages/consume-data/consumer-offsets.adoc (6)

11-11: Remove redundant wording
The phrase "manually specify a specific" is repetitive. Simplify to improve readability:

- You can manually specify a specific start value for offsets if needed.
+ You can specify a start value for offsets if needed.

19-19: Eliminate unnecessary comma
The comma after "consumer group" interrupts the flow. Consider removing it:

- NOTE: More advanced consumers can read data from Redpanda without using a consumer group, by requesting to read a specific topic + partition + offset range.
+ NOTE: More advanced consumers can read data from Redpanda without using a consumer group by requesting to read a specific topic + partition + offset range.

33-33: Clarify default auto commit behavior
Auto commit is enabled by default; rephrase to accurately reflect this and offer the disable option:

- You can enable this by setting `enable.auto.commit` to `true`. The client will then commit offsets every `auto.commit.interval.ms` milliseconds.
+ This strategy is enabled by default (`enable.auto.commit=true`). The client commits offsets every `auto.commit.interval.ms` milliseconds. To disable, set `enable.auto.commit=false`.

35-35: Hyphenate compound adjective
For consistency with style guides, hyphenate "already-processed messages":

- since consumption resumes from the last committed offset, which may include already processed messages
+ since consumption resumes from the last committed offset, which may include already-processed messages

110-112: Split long sentence for clarity
This sentence is dense and can be broken up to improve readability:

- The purpose of a commit is to save consumer progress. More frequent commits reduce the amount of data to re-read after an application restart, as the commit interval directly affects the Recovery Point Objective (RPO).
+ The purpose of a commit is to save consumer progress. More frequent commits reduce the amount of data that must be re-read after an application restart. This behavior directly affects the Recovery Point Objective (RPO).

114-115: Remove redundant phrasing
The term "client implementations" appears twice. Simplify to avoid repetition:

- In many Kafka client implementations, offset commits aren't coalesced at the client; so if a backlog of commits forms (when using the asynchronous commit API), the earlier commits still need to be processed, even though they are effectively redundant.
+ In many Kafka implementations, offset commits aren't coalesced; if a backlog forms when using the asynchronous API, earlier commits still need to be processed, despite being redundant.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1ac3e8 and 8c1cade.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
🔇 Additional comments (1)
modules/develop/pages/consume-data/consumer-offsets.adoc (1)

7-7: Overview heading is intentional
This page intentionally begins with an "Overview" to provide clear framing and context. No change required.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
modules/develop/pages/consume-data/consumer-offsets.adoc (5)

9-9: Consider hyphenating “round robin”
To treat it as a compound adjective, use “round-robin strategy” for consistency and clarity.


27-27: Capitalize section heading for consistency
Change == Commit strategies to == Commit Strategies to match the title-case style of other headings (e.g., “Overview”).


31-31: Standardize sub-section heading casing
Update === Automatic offset commit to === Automatic Offset Commit so it matches title-case formatting used elsewhere.


37-37: Typo in configuration term
Change “consume configuration” to “consumer configuration” for correct terminology:

- NOTE: If your consume configuration is set up to consume and write to another data store...
+ NOTE: If your consumer configuration is set up to consume and write to another data store...

39-39: Standardize sub-section heading casing
Update === Manual offset commit to === Manual Offset Commit for consistency with title-case sub-section headings.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c1cade and a5cad45.

📒 Files selected for processing (1)
  • modules/develop/pages/consume-data/consumer-offsets.adoc (1 hunks)
🔇 Additional comments (9)
modules/develop/pages/consume-data/consumer-offsets.adoc (9)

7-7: Approve the addition of an Overview heading
The new “Overview” section provides clear context at the very start and aligns with other topics that use a similar framing.


11-11: Well-defined offset explanation
This paragraph clearly defines how offsets are assigned and why they’re immutable, which is accurate and easy to follow.


13-13: Clear description of offset commits
The description of how commits are stored in the __consumer_offsets topic and are initiated by the consumer is comprehensive and accurate.


15-15: Accurate consumer group partition assignment
The explanation of partition assignment and rebalancing behavior is well-written and matches Kafka/Redpanda semantics.


17-17: Correct group coordinator overview
The note on how offsets are tracked and sent to the group coordinator broker is concise and accurate.


19-19: Good advanced consumer note
Highlighting the ability to bypass consumer groups for custom partition/offset ranges is a useful tip for stream-processing use cases.


29-29: Approve introductory sentence
This line smoothly transitions into the list of commit strategies.


33-33: Clear default strategy description
The explanation of enable.auto.commit and auto.commit.interval.ms behavior is precise and in line with Kafka defaults.


35-35: Good coverage of at-least-once semantics
The pros and cons of auto-commit are clearly articulated, including the guarantee of at-least-once delivery.

Co-authored-by: Michele Cyran <michele@redpanda.com>
@Feediver1 Feediver1 merged commit 1ffdc74 into main May 1, 2025
8 checks passed
@Feediver1 Feediver1 deleted the Doc-126 branch May 1, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants