Skip to content

Routing API#70

Merged
tombentley merged 30 commits into
kroxylicious:mainfrom
tombentley:routing-api
Jun 18, 2026
Merged

Routing API#70
tombentley merged 30 commits into
kroxylicious:mainfrom
tombentley:routing-api

Conversation

@tombentley

@tombentley tombentley commented Jun 7, 2025

Copy link
Copy Markdown
Member

Proposes a Router plugin API for Kroxylicious that enables routing requests to multiple upstream Kafka clusters, unlocking use cases like union clusters, principal-aware routing, and topic splicing that the Filter API alone cannot address.

Key points:

  • Router / RouterFactory: new top-level plugin type (per-connection instances, ServiceLoader-discovered), analogous to Filter / FilterFactory
  • Routes: named pathways from a router to a target (cluster or another router), each with optional filter chains. Routes and routers form a validated DAG
  • Virtual node IDs: runtime-managed mapping that resolves the node ID collision problem when multiple clusters are presented as one. Routers work exclusively with virtual IDs
  • RouterContext: provides bootstrapNodeId() and sendRequestToNode() for issuing requests
  • RouterResult: sealed type (Completed, CompletedNoResponse, Disconnect) encoding outcomes as values rather than side-effects
  • Static routes: performance optimisation allowing opaque frame forwarding for API keys that don't need inspection
  • Configuration: new top-level clusterDefinitions and routerDefinitions lists; virtual clusters gain a target property (discriminated union of cluster/router); existing targetCluster deprecated but still supported
  • Fully backwards compatible — no impact on the Filter API or existing configurations

edit: here's an annotated version of the final proposal that was merged, with commentary and links to relevant discussions.

@gunnarmorling gunnarmorling left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall! A few comments and suggestions inline.

Comment thread proposals/004-routing-api.md Outdated
Comment thread proposals/004-routing-api.md Outdated
Comment thread proposals/004-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/004-routing-api.md Outdated
Comment thread proposals/004-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
@tombentley tombentley mentioned this pull request Jul 4, 2025
Comment thread proposals/070-routing-api.md Outdated

@prabhamanepalli prabhamanepalli left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this proposal and ability to present multiple backing Kafka clusters as one single cluster to clients.

Can you please look into different types of routing other than identity/principal based routing.

Routing based on topic name, some embedded attribute in the header or payload itself.

I would like to know if the API being proposed is extensible for other types of routing other than principal based routing.

@tombentley

Copy link
Copy Markdown
Member Author

@prabhamanepalli thanks for taking a look!

Can you please look into different types of routing other than identity/principal based routing.
Routing based on topic name, some embedded attribute in the header or payload itself.

Those should absolutely be possible, and I consider them in-scope for this proposal. The "Union cluster" idea alluded to in the text is really a special case of this, where the mapping between topics (or more generally named broker-side entities like consumer groups) is defined statically in the configuration file. More generally, that mapping could be supplied over the network (e.g. from a database, metadata store or whatever).

It's up to the router how each request gets routed. Most common would be routing a request to a single target cluster or next router. But I don't think the API should enforce that. If a router implementation wanted to route a produce request, for example, to multiple clusters then that should be possible at the API level, even if dual writes are not usually a good idea.

What's not in scope for this proposal currently is making the set of routers, or the set of target clusters, dynamic during the lifetime of the proxy process. Dynamic configuration update is a separate item on the backlog.

Comment thread proposals/070-routing-api.md Outdated
k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit to k-wall/design that referenced this pull request Apr 24, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit to k-wall/design that referenced this pull request Apr 28, 2026
This change simplifies the proposal numbering system by using PR numbers
as proposal identifiers, eliminating number collisions and removing the
need for a separate allocation process.

Changes:
- Simplified proposals/README.md to focus on author workflow
  - Removed index tables (directory listing serves as the index)
  - Streamlined instructions for creating and renaming proposals
- Updated proposal template with workflow instructions
  - Require PR number in title format: # <PR#> - <Title>
  - Moved workflow instructions into comment block
- Added GitHub workflow to automatically check proposal numbering
  - Validates both filename and title format
  - Updates PR description when proposal files don't match PR number
  - Provides exact commands to fix naming issues
  - Removes warning once corrected
  - Handles both added and renamed files
  - Runs on all PRs (ready for mandatory status check)
- Added notification script for existing open PRs
  - After merge, run notify-open-prs.sh to ask authors to rebase
  - Workflow will automatically guide them through renaming
  - Updated with all current open proposal PRs (kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#85,
    kroxylicious#88, kroxylicious#93, kroxylicious#94, kroxylicious#96, kroxylicious#98, kroxylicious#99, kroxylicious#100, kroxylicious#101, kroxylicious#103)

Proposals 001-019 retain their original numbers.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit that referenced this pull request Apr 28, 2026
This change simplifies the proposal numbering system by using PR numbers
as proposal identifiers, eliminating number collisions and removing the
need for a separate allocation process.

Changes:
- Simplified proposals/README.md to focus on author workflow
  - Removed index tables (directory listing serves as the index)
  - Streamlined instructions for creating and renaming proposals
- Updated proposal template with workflow instructions
  - Require PR number in title format: # <PR#> - <Title>
  - Moved workflow instructions into comment block
- Added GitHub workflow to automatically check proposal numbering
  - Validates both filename and title format
  - Updates PR description when proposal files don't match PR number
  - Provides exact commands to fix naming issues
  - Removes warning once corrected
  - Handles both added and renamed files
  - Runs on all PRs (ready for mandatory status check)
- Added notification script for existing open PRs
  - After merge, run notify-open-prs.sh to ask authors to rebase
  - Workflow will automatically guide them through renaming
  - Updated with all current open proposal PRs (#70, #82, #83, #85,
    #88, #93, #94, #96, #98, #99, #100, #101, #103)

Proposals 001-019 retain their original numbers.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Keith Wall <kwall@apache.org>
@kidpollo

Copy link
Copy Markdown

This feature is looking attractive for us so we can simplify our efforts around splitting clusters. Has work in this gone beyond this stage?

Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
@k-wall

k-wall commented May 18, 2026

Copy link
Copy Markdown
Member

I think this proposal is heading in the right direction.

Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md Outdated
Comment thread proposals/070-routing-api.md
Comment thread proposals/070-routing-api.md
tombentley and others added 28 commits June 18, 2026 15:50
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
…Mapping` assumptions

Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Sync the proposal with implementation changes made in commits
4d40312bb and 7a1d9e777:

- Replace bootstrapNodeId(route) with virtualNodeId() and anyNodeId(route)
  to distinguish broker-specific vs bootstrap connections
- Remove route parameter from sendRequestToNode() — runtime derives it
  from the virtual node ID via NodeIdMapping.fromVirtual()
- Add routeNames() to RouterFactoryContext for validation
- Update NodeIdMapping description to reflect fromVirtual() now returns
  both route and target node ID
- Update API_VERSIONS section to use virtualNodeId()
- Add note that current implementation uses dedicated mapping only
- Update design choices to reflect new API rationale

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Sync the proposal with implementation changes made since commit
5b6545d72:

- Replace RouterResult sealed type with RouterResponse and builder pattern
- Add respondWith(), respondWithError(), respondWithoutReply() builder
  methods to RouterContext following the Filter API pattern
- Add CloseOrTerminalStage, TerminalStage, CloseStage interfaces for
  fluent response construction with optional connection closure
- Add topicName(Uuid) method to RouterContext for synchronous topic ID
  resolution (necessary for APIs like SHARE_FETCH that lack topic names)
- Update all RouterResult references to RouterResponse
- Update Response interface references (now returns ApiMessage directly)
- Update design choices section to reflect builder pattern and topicName()

The builder pattern allows routers to construct responses via
context.respondWith(body).build() instead of new RouterResult.Completed(),
consistent with how filters work and allowing the API to evolve without
breaking changes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Updates the proposal to reflect the implemented API:

- VirtualNode replaces int virtual node IDs as an opaque reference
  type, enabling alternative networking models (proxy-as-broker)
- nodeForId(int) bridges the Kafka wire protocol to VirtualNode
- RouterContext methods renamed: virtualNode(), anyNode(),
  sendRequest()
- TopologyService: opt-in topology cache with side-effect population
  from METADATA responses, coarse invalidation, router-driven
  staleness handling
- PartitionInfo and BrokerInfo for follower-fetch / AZ-aware routing
- Shared node address map (per-router-level, not per-connection)
- RequestSender binding for TopologyService async methods

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Coordinators are now cached via the same side-effect path as
METADATA — PendingResponse carries request-side context (keyType,
key) that the runtime uses alongside the FIND_COORDINATOR response
to populate the TopologyCache. No explicit putCoordinator call
needed from the router.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
TopologyService now has three symmetric discovery methods:
- leaders() -> CompletionStage<PartitionLeaders>
- coordinators() -> CompletionStage<Coordinators>
- topicNames() -> CompletionStage<Map<Uuid, String>>

Removes leaderOf(), coordinatorOf(), ensureLeadersCached(), and
discoverCoordinator(). Documents the rationale: sync cache queries
are unsafe because the shared cache can be invalidated between
discovery and query. Self-contained snapshots eliminate this race.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Topic ID resolution moves from the synchronous RouterContext.topicName()
to the async TopologyService.topicNames(Set<Uuid>), consistent with
leaders() and coordinators(). Documents the rationale: the synchronous
method silently swallowed resolution errors and was inconsistent with
the async discovery pattern.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
The shared topology cache is not scoped by authenticated subject.
Document why this is safe (additive/union semantics, routing-only
use, backend ACL enforcement) and clarify that router authors must
not use the cache for authorization decisions.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Mark references to internal runtime classes (VirtualNodeImpl,
RoutingDecisionHandler, PendingResponse, CoordinatorRequestContext,
TopologyCache, RouterContextImpl, TopologyServiceImpl, RouterChainFactory,
MetadataAddressCacher, NodeIdResponseTranslator, RouterGraphValidator)
so readers can distinguish them from public API types.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Update cache scope and request sending sections to reflect that
TopologyServiceImpl is per-connection (wrapping a shared
TopologyCache), eliminating the RequestSender race condition.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Clarify that SaslInitiator on per-route filter chains authenticates
the proxy to the upstream cluster and does not affect
authenticatedSubject(). Document the four placement rules that
ensure composability and avoid pathological configurations.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Per-route filter chains can transform topic names differently, so
the same cluster-level topic ID can map to different router-visible
names on different routes. When allowSharedClusterTargets is enabled,
omitting the route parameter causes the cache to return whichever
name was written last.

Change TopologyService.topicNames(Set<Uuid>) to
topicNames(String route, Set<Uuid>) and update invalidation semantics
to clear topic name entries on invalidateRoute().

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
The dedicated mapping formula V = id + S × t uses the route count S as
a multiplier. Adding or removing routes changes S, shifting all virtual
node IDs for existing routes. Acknowledge this explicitly in both the
mapping strategy section and operational considerations, noting that it
requires draining all connections and reconfiguring all proxy instances
together.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants