RATIS-2388 (Further) Enhancing content for concept in ratis-docs #1338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jcshepherd wants to merge 1 commit into apache:master from jcshepherd:master

+621 −74

jcshepherd commented Jan 23, 2026 •

edited

Loading

What changes were proposed in this pull request?

This is a rewrite of concepts/index.md in ratis-docs, which goes into substantially more detail about Ratis/Raft architecture and how applications integrate w/Ratis.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2388

How was this patch tested?

Documentation only.

szetszwo reviewed

View reviewed changes

Contributor

szetszwo left a comment

@jcshepherd , thanks a lot for contributing this! I have reviewed down to the Snapshot section. Please see the comments so far. Will continue reviewing this.

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              Raft's safety guarantees depend on majority agreement within each group. The leader replicates
+              each operation to the followers in its group, and operations are committed when at least
+              (N/2 + 1) peers in that group acknowledge them. This means a group of 3 peers can tolerate 1

Contributor

szetszwo Jan 26, 2026

Let's use $\lfloor N/2 + 1 \rfloor$ .

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

Comment on lines 76 to 78

+              A single cluster can host multiple independent Raft groups, each with its own leader election,
+              consistency and state replication. Groups typically consist of an odd number of peers (3, 5, or
+are common) to ensure clear majority decisions.

Contributor

szetszwo Jan 26, 2026

Let's combine the first sentence to the paragraph and move the second sentence to the next section, which talks about majority.

diff --git a/ratis-docs/src/site/markdown/concept/index-v2.md b/ratis-docs/src/site/markdown/concept/index-v2.md
index d51146863..bf1463c43 100644
--- a/ratis-docs/src/site/markdown/concept/index-v2.md
+++ b/ratis-docs/src/site/markdown/concept/index-v2.md
@@ -71,18 +71,20 @@ group is a logical consensus domain that runs across a specific subset of peers
-At any given time, one peer in a group acts as the "leader" while the others are "followers" or
+One of the peers in a group acts as the "leader" while the others are "followers" or
 "listeners". The leader handles all write requests and replicates operations to other peers in
 the group. Both leaders and followers can service read requests, with different consistency
-guarantees.
+guarantees. A single cluster can host multiple independent Raft groups, 
+each with its own leader election, consistency and state replication.
 
-A single cluster can host multiple independent Raft groups, each with its own leader election,
-consistency and state replication. Groups typically consist of an odd number of peers (3, 5, or
-7 are common) to ensure clear majority decisions.
 
 ### Majority-Based Decision-Making
 
 Raft's safety guarantees depend on majority agreement within each group. The leader replicates
 each operation to the followers in its group, and operations are committed when at least
-(N/2 + 1) peers in that group acknowledge them. This means a group of 3 peers can tolerate 1
+$\lfloor N/2 + 1 \rfloor$ peers in that group acknowledge them. 
+This means a group of 3 peers can tolerate 1
 failure, a group of five peers can tolerate 2 failures, and so on.
+Since a group of $N$ peers for an even $N$ can tolerate the same number of failures as
+a group of $(N-1)$ peers, groups typically consist of an odd number of peers (3, 5, or
+7 are common) to ensure clear majority decisions.
 
 This majority requirement affects both availability and performance. A group remains available as
 long as a majority of its peers are reachable and functioning. However, every transaction must

ratis-docs/src/site/markdown/concept/index-v2.md Outdated


		### Servers, Clusters, and Groups

		A Raft server (also known as a "peer") is a single running instance of your application with

Contributor

szetszwo Jan 26, 2026

Let's add also "member", i.e.

... (also known as a "peer" or a "member")

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              A Raft cluster is a physical collection of servers that can participate in consensus. A Raft
+              group is a logical consensus domain that runs across a specific subset of peers in the cluster.
+              At any given time, one peer in a group acts as the "leader" while the others are "followers" or

Contributor

szetszwo Jan 26, 2026

Since a group could temporarily have no leader or more than one leaders (old leader not yet timed out), let remove "At any given time, ", i.e.

One of the peers in a group acts as the "leader" ...

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

Comment on lines 127 to 128

		The state machine is not a finite state machine with states and transitions. Instead, it's a
		deterministic computation engine that processes a sequence of operations and maintains some

Contributor

szetszwo Jan 26, 2026

Let's remove the first sentence since an application can implement its Raft state machine as a finite state machine.

A state machine is a deterministic computation engine ...

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              after the snapshot, bringing the peer up to the current state of the group.
+              During normal operation, the state machine continuously processes transactions as they're
+              committed by the Raft group, responds to leadership changes, and handles read-only queries. For

Contributor

szetszwo Jan 26, 2026

We could remove "responds to leadership changes" since the state machine does not need to do anything.

Author

jcshepherd Jan 27, 2026

If the state machine implements LeaderEventApi (or FollowerEventApi) then it is able to respond to leadership changes. I will de-emphasize it though. If you feel it's not a complication that is worth introducing in this doc, then I'll remove it: let me know either way.

Contributor

szetszwo Jan 27, 2026

... the state machine implements LeaderEventApi (or FollowerEventApi) ...

Yes, you are right that a state machine can implement those APIs. I mean that supporting those APIs is optional.

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              ### Designing Your State Machine
+              When designing your state machine, ensure your operations are deterministic and can be
+              efficiently serialized for replication. Operations must be idempotent, as Raft may occasionally

Contributor

szetszwo Jan 26, 2026

Idempotent is not a requirement of state machine -- e.g. x = x+1 is a valid non-idempotent operation. Each transaction is applied exactly one time.

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              ### Read Consistency Options
+              **Linearizable reads** provide the strongest consistency by going through the Raft protocol to
+              ensure you're reading the most up-to-date committed data. Use the client's `sendReadOnly` method,

Contributor

szetszwo Jan 26, 2026

sendReadOnly may use Linearizable read or Leader read, depending on the conf raft.server.read.option.

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              data if the leader has been partitioned from the majority.
+              **Follower reads** provide eventual consistency by serving reads directly from followers using
+              their local state machine. Call `sendReadOnly(message, serverId)` with a specific follower's

Contributor

szetszwo Jan 26, 2026

When linearizable read is enabled, follower read using sendReadOnly(message, serverId) is also linearizable.

Author

jcshepherd Jan 27, 2026

Ah, this is more nuanced than I thought. I've attempted to rewrite the section on "Read Consistency Options". Let me know what you think.

ratis-docs/src/site/markdown/concept/index-v2.md Outdated

+              the snapshot data is loaded replacing any existing state, and the state machine resumes normal
+              operation by replaying any log entries that occurred after the snapshot.
+              Your state machine's `initialize` method is responsible for loading snapshots during startup by

Contributor

szetszwo Jan 26, 2026

The method is reinitialize.

Author

jcshepherd commented Jan 27, 2026

@szetszwo - Thanks very much for taking the time to review: I know it's a longish document. I've addressed your feedback. I struggled a bit with reworking the section on read consistency. If you have any further thoughts on how to make that section clearer or improve its flow, please let me know.

Contributor

szetszwo commented Jan 28, 2026

@jcshepherd , thanks for the update!

... I know it's a longish document. ...

How about breaking it to several md files? We can merge the files separately. It would also be easier to read.

... I struggled a bit with reworking the section on read consistency. If you have any further thoughts on how to make that section clearer or improve its flow, please let me know.

Sure, will continue reviewing it tomorrow.

Author

jcshepherd commented Jan 28, 2026

Good idea about breaking it apart: thanks. I've broken it into five sections/files with navigation between and within sections. You've reviewed the content in sections 1, 2 and 4 so far. The discussion on read consistency is now in section 2. Sections 3 and 5 probably haven't been looked at.
It feels like a largish number of sections, but it seems to flow pretty well and offers lots of room for future expansion. Let me know what you think.

Contributor

szetszwo commented Jan 30, 2026

... I struggled a bit with reworking the section on read consistency. If you have any further thoughts on how to make that section clearer or improve its flow, please let me know.

I have the following suggestion for the read section:

#### Read Consistency Options

Ratis provides several read patterns with different consistency and performance characteristics.
Read requests query the state machine of a server directly without going through the Raft consensus protocol. 
The `sendReadOnly()` API sends the request to the leader.
(When a non-leader server receives such request, it throws a `NotLeaderException`
and then the client will retry other servers.)
In contrast, the `sendReadOnly(message, serverId)` API sends the request to a particular server,
which may be a leader or a follower.

The server's `raft.server.read.option` configuration affects read behavior:

* **DEFAULT (default setting)**: `sendReadOnly()` performs leader reads for efficiency.
  It provides strong consistency under normal conditions.
  *  Split-brain Problem: In case that an old leader has been partitioned from the majority
  and a new leader has been elected, reading from the old leader can return stale data
  since the old leader does not have the new transactions committed by the new leader.

* **LINEARIZABLE**: both `sendReadOnly()` and `sendReadOnly(message, serverId)`
  use the ReadIndex protocol to provide linearizable consistency, ensuring you always read the most
  up-to-date committed data and won't read stale data as described in the "Split-brain Problem" above.  
  * Non-linearizable API: Clients may use `sendReadOnlyNonLinearizable()` to read from leader's state machine
    directly without a linearizable guarantee.

Other than the `sendReadOnly(..)` methods mentioned above,
we have the following read APIs:

**Stale reads with minimum index** let you specify a minimum log index that the peer must have applied
before serving the read.  Call `sendStaleRead()`: if the peer hasn't caught up to your minimum index,
it will throw a `StaleReadException`.

**Asynchronous reads**:
Ratis supports both blocking reads and asynchronous reads.
All the blocking read methods are supported by asynchronous reads
-- the blocking reads return a reply directly while asynchronous reads return a future of the reply.
Asynchronous reads additionally support the following APIs

* **Read-after-write consistency** ensures reads reflect the latest successful write by the same client.
Since write requests go through the Raft consensus protocol but read requests do not,
a read request $R$ may be completed before a write request $W$
even if $R$ is sent (asynchronously) after $W$.
Therefore, the reply returned by $R$ may not include the result updated by $W$.
Use `sendReadAfterWrite()` when you need to read your own writes immediately.

* **Unorder reads** let the asynchronous read requests complete in any order.
The fastest reply is completed first.
Use `sendReadOnlyUnordered()` when you don't care about the ordering.

Author

jcshepherd commented Jan 30, 2026

Thank you again. :-) I restructured the read consistency section along the lines you suggested and also added a mention and forward reference to the discussion about blocking and async APIs. Thanks!

szetszwo reviewed

View reviewed changes

Contributor

szetszwo left a comment

@jcshepherd , Thanks for the update! Please see the comments inlined.

I think we have gone through all the docs. We probably could merge it after the comments are addressed.

BTW, have you used any AI generative tool to create the doc?

ratis-docs/src/site/markdown/concept/operations.md Outdated

Comment on lines 82 to 84

+              Leadership in Ratis is both simpler and more complex than it might initially appear. Ratis
+              handles all the mechanics of leader election and failover automatically, but your application
+              needs to handle leadership changes robustly.

Contributor

szetszwo Jan 31, 2026 •

edited

Loading

Why "your application needs to handle leadership changes robustly" ? Only the applications want to know who is the leader need to handle leadership changes. If an application does not care about who is the leader, then it does not need to do anything.

The first sentence is a kind of a personal comment. Let's remove it.

I suggest to rewrite this paragraph as below:

Ratis handles all the mechanics of leader election and failover automatically.
If your application does not care about who is the leader, then it does not need to do anything.
Otherwise, your application can optionally observe leadership change and react accordingly;
see [State Machine Leadership Events](#State-Machine-Leadership-Events).

Author

jcshepherd Feb 2, 2026

Thanks: I think that's a better phrasing. Before I read your full comment just now I was going to raise the scenario of client that strongly prefers follower reads needed some kind of application logic to decide what to do if a follower is elected leader, but agree the phrasing as written over-states the need.

ratis-docs/src/site/markdown/concept/operations.md Outdated

+              #### Handling Network Partitions
+              When a network partition occurs, the Raft group may split into multiple subgroups that cannot
+              communicate with each other. Raft's majority-based approach ensures that only one subgroup (the

Contributor

szetszwo Jan 31, 2026

only one -> at most one

ratis-docs/src/site/markdown/concept/operations.md Outdated

Comment on lines 126 to 129

		decisions. However, it also means that your application may become unavailable for writes if a
		majority of servers are unreachable.

Contributor

szetszwo Jan 31, 2026

"if a majority of servers are unreachable" -> "if no subgroups have a majority of servers."

Author

jcshepherd Feb 2, 2026

These are both true, correct? E.g., if the client is in DC A, DC A is network partitioned from DC B and DC B contains the majority, then the client in DC A will be unable to write until the partition heals. How about "However, it also means that your application may become unable to write if either the majority subgroup is unreachable or no subgroup has the majority of servers." ?

Contributor

szetszwo Feb 2, 2026 •

edited

Loading

If a client can connect to the leader (not necessarily a majority of server), it can write. Also, the clients in DC A cannot write does not mean "the application may become unavailable" since the other clients in DC B can write.

For simplicity, let's just talk about servers in this section.

Author

jcshepherd Feb 2, 2026

True, true. Went with your suggestion.

ratis-docs/src/site/markdown/concept/advanced.md Outdated

+              #### What is a Raft Group in Ratis?
+              In Ratis terminology, a "Raft Group" is a collection of servers that participate in a single
+              Raft cluster. Each group has a unique RaftGroupId (typically a UUID) that distinguishes it from

Contributor

szetszwo Jan 31, 2026

RaftGroupId must be a UUID. So, let's remove "typically".

Author

jcshepherd commented Feb 2, 2026

BTW, have you used any AI generative tool to create the doc?

Yes, definitely: I've been using Amazon Q CLI to analyze the Ratis and Ozone codebases, as well as public documentation, and generate the initial document drafts, as well as helping with organization. I've personally reviewed every line of documentation and made corrections/adjustments where I felt they were needed, so I own any errors.

Reviewing the rest of your feedback now: thanks.

jcshepherd force-pushed the master branch from 6bc7130 to 1873226 Compare

February 2, 2026 18:43

Author

jcshepherd commented Feb 2, 2026

Thanks. With the exception of my question above (#1338 (comment)), I've incorporated your feedback. I also moved my working copy of index.md to index.md and squashed the commits thus far.

jcshepherd force-pushed the master branch from e5071a0 to 1a922f9 Compare

February 2, 2026 18:55


          RATIS-2388 (Further) Enhancing content for concept in ratis-docs: exp…

581c33b

…anding content, breaking document into 5 sections/files for readability and future expansion, with navigation between and within sections.

jcshepherd force-pushed the master branch from abc1270 to 581c33b Compare

February 2, 2026 19:04

Author

jcshepherd commented Feb 2, 2026

"Split-brain" behavior revised as suggested, and squashed all commits to this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet