Skip to content

Conversation

@cjen1-msft
Copy link
Contributor

@cjen1-msft cjen1-msft commented Dec 2, 2025

All raft scenarios now have a declaration of whether or not they use pre-vote.

This PR also resolves three issues in ccfraft.tla and traceccfraft.tla.

First and most significantly is that the RcvProposeVoteRequest path was broken when pre-vote was enabled.
Specifically, in adding pre-vote to BecomeCandidate, I had asserted that messages was unchanged which is not true for this path, so it was never enabled.
This is fixed by moving the pre-vote BecomeCandidate out to its own BecomeCandidateFromPreVoteCandidate.

Second IsBecomeFollower asserted that the commit index matched the logline. This was invalid as the spec is out of sync until the subsequent SendAppendEntriesResponse action.
I am not quite sure why this worked previously, but it could be that in the reconfig_01_el0_12 test, this event did not fire as the corresponding node was already a follower.

Finally BecomePreVoteCandidate disallowed doing so from a PreVoteCandidate.
raft.h allows this event to fire multiple times, as each time it resends the corresponding RequestPreVote messages, hence causing trace validation to fail.
This also requires strong fairness for BecomeCandidateFromPreVoteCandidate.

Finally we have four tests which explicitly don't use pre-vote, primarily to make them easier.

  • soft_rollback
    • This tests rollback behaviour and its easier to get the nodes into the right states without pre-vote
  • retire_backup
    • This is the test that has a liveness issue. The retire_backup_pre_vote demonstrates that pre-vote fixes the liveness problem.
  • candidate_viability
    • Similarly to soft_rollback this tests the candidate viability condition, and so requires nodes in specific states which are just easier to get without pre-vote
  • bad_network
    • This is a reasonably complex test, and relies on term increments from multiple elections which can't happen with pre-vote. So enabling pre-vote would semantically change this test.

@cjen1-msft cjen1-msft added the run-long-verification Run Long Verification jobs label Dec 2, 2025
@cjen1-msft cjen1-msft marked this pull request as ready for review December 2, 2025 15:17
@cjen1-msft cjen1-msft requested a review from a team as a code owner December 2, 2025 15:17
Copilot AI review requested due to automatic review settings December 2, 2025 15:17
Copilot finished reviewing on behalf of cjen1-msft December 2, 2025 15:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables pre-vote trace validation for raft scenarios and fixes three critical issues in the TLA+ specifications:

  • Separates pre-vote candidate to candidate transition into a dedicated action (BecomeCandidateFromPreVoteCandidate)
  • Allows PreVoteCandidate to retry becoming a PreVoteCandidate (multiple pre-vote attempts)
  • Removes invalid commit index assertion in IsBecomeFollower that was out of sync with the spec
  • Adds explicit pre-vote enabled/disabled declarations to all test scenarios

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tla/consensus/ccfraft.tla Refactors pre-vote logic by extracting BecomeCandidateFromPreVoteCandidate action and fixing BecomePreVoteCandidate to allow re-attempts; removes VotesGrantedMonotonicProp property
tla/consensus/Traceccfraft.tla Updates trace validation to use new pre-vote action; splits IsRcvProposeVoteRequest into two steps to properly handle state transitions; removes invalid commit index assertion from IsBecomeFollower
tla/consensus/SIMccfraft.tla Updates fairness condition to reference new BecomeCandidateFromPreVoteCandidate action
tla/consensus/SIMccfraft.cfg Removes VotesGrantedMonotonicProp from checked properties
tests/raft_scenarios/* (10 files) Adds pre_vote_enabled,true declarations to scenarios that use pre-vote; updates assertions to expect PreVoteCandidate states; adjusts test sequencing for pre-vote behavior
tests/raft_scenarios/candidate_viability Explicitly disables pre-vote to test candidate viability condition
tests/raft_scenarios/bad_network Explicitly disables pre-vote as test relies on term increments from multiple elections

@achamayou
Copy link
Member

@cjen1-msft it's ok to keep tests that don't use prevote for a transition period, but they need to eventually either be converted, or retired if we think that's the right outcome.

@cjen1-msft
Copy link
Contributor Author

Allowing pre-vote candidates to become pre-vote candidates via timeout has highlighted a weakness of the existing spec.
Primarily this has caused the liveness properties to fail (LogMatchingProp and LeaderProp) as effectively a node can transition to PreVoteCandidate, and then keep on transitioning to PreVoteCandidate indefinitely (modulo some fanagling to get the fairness properties to also be satisified).

The problem is that these properties have always been false, its just that with finite model checking we could not verify this.
Specifically they are false in the face of constant timeout/BecomeCandidate s.
The only reason our simulation checking did not flag this was that every BecomeCandidate increments the term making the resultant state technically new.
If we had a symmetry reduction for terms, then we would see these counterexamples.
As it is the strongest statement about these properties is that they held when there were no timeouts.

Hence after some discussion with @achamayou we came to conclusion of removing these properties as they are currently effectively unchecked, but dangerously provide the appearance of ensuring liveness.

@cjen1-msft cjen1-msft merged commit 7cc65e3 into microsoft:main Dec 5, 2025
23 checks passed
@cjen1-msft cjen1-msft deleted the partial-enable-pre-vote-tracing-2 branch December 5, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-long-verification Run Long Verification jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants