Skip to content

rotation of outbound peers#3037

Draft
pompon0 wants to merge 44 commits intomainfrom
gprusak-stablehash
Draft

rotation of outbound peers#3037
pompon0 wants to merge 44 commits intomainfrom
gprusak-stablehash

Conversation

@pompon0
Copy link
Contributor

@pompon0 pompon0 commented Mar 6, 2026

To make the p2p network uniformly random, nodes need to be able to change their peers periodically (otherwise network would be centralized at bootstrap peers). This PR implements an algorithm based on stable hashing, which makes nodes assign random priority to every node ID, and connect to peers with higher priority.

This pr also separates inbound and outbound connection pools to simplify logic. In particular it is possible that there will be 2 concurrent connections between 2 peers (inbound + outbound). Avoiding duplicate connections is best effort - nodes try not to dial peers that they are connected to, but in case 2 peers dial each other at the same time they let those 2 connections be.

Basic requirements:

  • peermanager maintains a pex table: addresses of peers of our peers. This bounds the network view size (even though it is an unauthenticated network) that our node needs to maintain and provides enough exposure to select new peers. Peers are periodically reporting their connections, therefore peermanager keeps only fresh addresses
  • on startup a spike of dials is expected - node will try to connect to peers that it was connected to before restart
  • during stable operation node will dial peers at a low rate like 1/s or 0.1/s, and the node should be selected from a fresh set of addresses - i.e. we cannot snapshot a list of currently available addresses and try to dial them all (it will take hours)
  • despite low dial rate, node should attempt to round robin over the ever changing peer candidates set - i.e. it should not get stuck dialing the same bad address over and over
  • in the stable-hash-based approach, each peer ID obtains a priority for dialing - it should be taken into account
  • implementation should support replacing low priority peers with higher priority peers (to support convergence to a random graph). The churn of the connections should be low though, so that connection efficiency is not affected. My initial guesstimate would be that we should allow replacing a connection every ~1min.
  • We need to support a pex table with ~100 * 100 = 10k addresses total (100 connections per peer is a safe estimate with the current implementation). Whether we can affort just rank all the addresses on every dial attempt is a borderline IMO.
  • the addresses inserted to pex table should be made available for dialing ASAP, without any active polling, if possible.

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 11, 2026, 5:50 PM

Comment on lines +28 to +37
for id := range s.last {
if _,ok := GetAny(conns,id); !ok {
delete(s.last, id)
update = PeerUpdate{
NodeID: id,
Status: PeerStatusDown,
}
return true
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.24%. Comparing base (31c1592) to head (b1c0b8b).
⚠️ Report is 39 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3037      +/-   ##
==========================================
- Coverage   58.30%   58.24%   -0.07%     
==========================================
  Files        2108     2108              
  Lines      173683   173318     -365     
==========================================
- Hits       101268   100949     -319     
+ Misses      63397    63366      -31     
+ Partials     9018     9003      -15     
Flag Coverage Δ
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/config/config.go 69.88% <ø> (-0.35%) ⬇️
sei-tendermint/internal/p2p/channel.go 77.14% <ø> (-0.64%) ⬇️
sei-tendermint/internal/p2p/peermanager.go 87.50% <100.00%> (+2.92%) ⬆️
sei-tendermint/internal/p2p/peermanager_pool.go 100.00% <100.00%> (ø)
sei-tendermint/internal/p2p/pex/reactor.go 90.10% <ø> (+0.97%) ⬆️
sei-tendermint/internal/p2p/router.go 85.47% <ø> (+0.67%) ⬆️
sei-tendermint/internal/p2p/routeroptions.go 83.78% <ø> (-2.58%) ⬇️
sei-tendermint/internal/p2p/testonly.go 80.47% <ø> (+0.56%) ⬆️
sei-tendermint/internal/p2p/transport.go 88.46% <ø> (+2.09%) ⬆️
sei-tendermint/libs/utils/option.go 93.54% <ø> (+2.63%) ⬆️
... and 3 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

return o.value
}
panic("value missing")
panic(msg)

Check warning

Code scanning / CodeQL

Panic in BeginBock or EndBlock consensus methods Warning

Possible panics in BeginBock- or EndBlock-related consensus methods could cause a chain halt
type connSet[C peerConn] = im.Map[connID, C]

func GetAny[C peerConn](cs connSet[C], id types.NodeID) (C,bool) {
if c,ok := cs.Get(connID{id,true}); ok { return c,true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will always prefer outbound connection over inbound connection? Would we expect to have two connections between most stable peers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users of GetAny are not distinguishing inbound from outbound, so the fact that we allow both to coexist in some corner cases is an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation we do NOT expect having 2 connections between stable peers. We try to avoid it best effort (via dial rule which disallows dialing if there exist an inbound connection), but keep the internal peermanager logic relatively simple, we allow 2 connections coexist in case of a race condition during dialing.

for inner,ctrl := range m.inner.Lock() {
// Start with pool which has NOT dialed previously (for fairness).
pools := utils.Slice(inner.persistent,inner.regular)
if pools[0]==inner.lastDialPool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the design choice here. I assume persistent peers are configured "friends" so they should always be dialed asap, and since the user configured it, it's unlikely to have many invalid entries anyway. Am I missing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persistent peers are very likely to have bad addresses: people add persistent peers but don't care about keeping the list healthy. Also in case any of the persistent peers goes down, we don't want it to block dialing regular peers.

for inner, ctrl := range p.inner.Lock() {
inner.DialFailed(addr)
// DialFailed notifies the peer manager that dialing addresses of id has failed.
func (m *peerManager[C]) DialFailed(id types.NodeID) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we mark the node failed on temporary failure as well? Should we have more complicated model to distinguish between temporary failure and long term failure (like NAT addresses)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, what do you propose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR, but in the future we could give a rating to how flaky some address is and pass the information along to peers maybe.

if len(addrs) == 0 {
return nil
}
func (m *peerManager[C]) PushPex(sender types.NodeID, addrs []NodeAddress) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much verification does PushPex give us?

Assuming we don't have many malicious peers who send bad addresses, do we trust incoming address blindly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PushPex keeps separate capacity for each sender (to avoid unbounded flood of bad addresses). Other than that, we trust them blindly. I was thinking about gossiping signed versioned addresses (so that it is not feasible to gossip fake addresses of honest nodes), but I'm not sure atm how much value it would bring and whether it would be computationally feasible.

r.peerManager.DialFailed(id)
return err
}
dialAddrRaw := hConn.conn.RemoteAddr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we save IP:port instead of hostname now? What if the hostname later resolves to a different IP:port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it resolves to different IP:port, then the TCP connection will terminate and we will stop advertising the IP:port, since we are no longer connected to the peer. I would like to eventually disallow gossiping dns addresses, in favor of resolving them locally first, but for backward compatibility and to limit the scope of changes, we do the resolution only when dialing the peer (i.e. SelfAddress stays unresolved for now).

inner utils.Watch[*peerManagerInner[C]]

inner utils.Watch[*peerManagerInner[C]]
conns utils.AtomicRecv[connSet[C]]
}

func (p *peerManager[C]) LogState() {
for inner := range p.inner.Lock() {
p.logger.Info("p2p connections",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have metrics showing roughly how many peers are oscillating? Could it be that a few bad peers contributed to the most of the disconnections? Are those permanent bad addresses like NAT addresses or just having network issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very likely that the preconfigured bad persistent addrs will be responsible for most of the dial failures. We have metrics counting tcp dials/accepts, but not accounting them to specific node ids. Do you have some specific metric in mind that you would like to have here? NodeID/IP have too large cardinality to use as a prometheus metric label, but perhaps we can collect it in some smarter way. We also have the net_info http endpoint that we can expose such stats in a dashboard. Or we can include them in log error messages (depending on verbosity needed).

Copy link
Contributor Author

@pompon0 pompon0 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAT is not a problem, because we do not support gossiping unidirectional NAT addresses - we only gossip outbound dial addresses (i.e. addresses that we successfully dialed), or preconfigured ExternalAddresses (self-declared static addresses). About private network addresses (including private ips, kubernetes internal dns, etc.) - I don't know the scale of the problem.

Comment on lines +66 to +70
for _, e := range t.bySender {
if !yield(e) {
return
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +142 to +146
for old, priority := range p.out {
if id, ok := low.Get(); !ok || priority < id.priority {
low = utils.Some(pNodeID{priority, old})
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
ID types.NodeID
Channels ChannelIDSet
DialAddr utils.Option[NodeAddress]
SelfAddr utils.Option[NodeAddress]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DialAddr here means outbound address while SelfAddr is self-declared address? (So it's inbound connection?)

The names are a bit confusing.

regular: newPoolManager(&poolConfig{
MaxIn: options.maxInbound(),
MaxOut: options.maxOutbound(),
FixedAddrs: bootstrapAddrs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible someone set all bootstrap addresses in persistent so FixedAddrs is now empty?

// It makes the global topology converge to an uniformly random graph
// of a bounded degree.
priority := func(id types.NodeID) uint64 {
return binary.LittleEndian.Uint64(h.Sum([]byte(id)[:8]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong. h.Sum(b) will append hash digest to b instead of hashing b, and Uint64() just takes the first 8 bytes, so we are only using the first 8 bytes of id, not the hash.

}
return nil
})
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return nil be here ? We don't want to exit the loop right?

// We cannot keep it unbounded to avoid OOM.
const maxHistory = 10000
if len(p.dialHistory) >= maxHistory {
p.dialHistory = map[types.NodeID]struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a total reset here or should we just clear part of the history?

@@ -157,77 +133,6 @@ func TestReactorErrorsOnReceivingTooManyPeers(t *testing.T) {
testNet.listenForPeerDown(t, 1, 0)
}

func TestReactorSmallPeerStoreInALargeNetwork(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests replaced by new tests somewhere?

}

// Connected adds conn to the connections pool.
// Connected peer won't be available for dialing until disconnect (we don't need duplicate connections).
// May close and drop a duplicate connection already present in the pool.
// Returns an error if the connection should be rejected.
func (m *peerManager[C]) Connected(conn C) error {
id := conn.Info().connID()
if id.NodeID == m.selfID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would this ever happen?

pool := inner.poolByID(id.NodeID)
toDisconnect, err := pool.Connect(id)
if err != nil {
conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we are closing a connection just established, when would this be necessary?

}
return ids
}

// DEPRECATED, currently returns an address iff we are connected to id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this one is still used somewhere right?

selfID := privKey.Public().NodeID()
peerManager := newPeerManager[*ConnV2](logger, selfID, options)
peerDB, err := newPeerDB(db, options.maxPeers())
peerDB, err := newPeerDB(db, min(options.maxOutbound(), 100))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why 100?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants