Routing API by tombentley · Pull Request #70 · kroxylicious/design

tombentley · 2025-06-07T00:34:41Z

Proposes a Router plugin API for Kroxylicious that enables routing requests to multiple upstream Kafka clusters, unlocking use cases like union clusters, principal-aware routing, and topic splicing that the Filter API alone cannot address.

Key points:

Router / RouterFactory: new top-level plugin type (per-connection instances, ServiceLoader-discovered), analogous to Filter / FilterFactory
Routes: named pathways from a router to a target (cluster or another router), each with optional filter chains. Routes and routers form a validated DAG
Virtual node IDs: runtime-managed mapping that resolves the node ID collision problem when multiple clusters are presented as one. Routers work exclusively with virtual IDs
RouterContext: provides bootstrapNodeId() and sendRequestToNode() for issuing requests
RouterResult: sealed type (Completed, CompletedNoResponse, Disconnect) encoding outcomes as values rather than side-effects
Static routes: performance optimisation allowing opaque frame forwarding for API keys that don't need inspection
Configuration: new top-level clusterDefinitions and routerDefinitions lists; virtual clusters gain a target property (discriminated union of cluster/router); existing targetCluster deprecated but still supported
Fully backwards compatible — no impact on the Filter API or existing configurations

edit: here's an annotated version of the final proposal that was merged, with commentary and links to relevant discussions.

gunnarmorling

This looks great overall! A few comments and suggestions inline.

prabhamanepalli

I really like this proposal and ability to present multiple backing Kafka clusters as one single cluster to clients.

Can you please look into different types of routing other than identity/principal based routing.

Routing based on topic name, some embedded attribute in the header or payload itself.

I would like to know if the API being proposed is extensible for other types of routing other than principal based routing.

tombentley · 2025-07-16T05:01:26Z

@prabhamanepalli thanks for taking a look!

Can you please look into different types of routing other than identity/principal based routing.
Routing based on topic name, some embedded attribute in the header or payload itself.

Those should absolutely be possible, and I consider them in-scope for this proposal. The "Union cluster" idea alluded to in the text is really a special case of this, where the mapping between topics (or more generally named broker-side entities like consumer groups) is defined statically in the configuration file. More generally, that mapping could be supplied over the network (e.g. from a database, metadata store or whatever).

It's up to the router how each request gets routed. Most common would be routing a request to a single target cluster or next router. But I don't think the API should enforce that. If a router implementation wanted to route a produce request, for example, to multiple clusters then that should be possible at the API level, even if dual writes are not usually a good idea.

What's not in scope for this proposal currently is making the set of routers, or the set of target clusters, dynamic during the lifetime of the proxy process. Dynamic configuration update is a separate item on the backlog.

This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>

This change simplifies the proposal numbering system by using PR numbers as proposal identifiers, eliminating number collisions and removing the need for a separate allocation process. Changes: - Simplified proposals/README.md to focus on author workflow - Removed index tables (directory listing serves as the index) - Streamlined instructions for creating and renaming proposals - Updated proposal template with workflow instructions - Require PR number in title format: # <PR#> - <Title> - Moved workflow instructions into comment block - Added GitHub workflow to automatically check proposal numbering - Validates both filename and title format - Updates PR description when proposal files don't match PR number - Provides exact commands to fix naming issues - Removes warning once corrected - Handles both added and renamed files - Runs on all PRs (ready for mandatory status check) - Added notification script for existing open PRs - After merge, run notify-open-prs.sh to ask authors to rebase - Workflow will automatically guide them through renaming - Updated with all current open proposal PRs (kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#85, kroxylicious#88, kroxylicious#93, kroxylicious#94, kroxylicious#96, kroxylicious#98, kroxylicious#99, kroxylicious#100, kroxylicious#101, kroxylicious#103) Proposals 001-019 retain their original numbers. Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>

This change simplifies the proposal numbering system by using PR numbers as proposal identifiers, eliminating number collisions and removing the need for a separate allocation process. Changes: - Simplified proposals/README.md to focus on author workflow - Removed index tables (directory listing serves as the index) - Streamlined instructions for creating and renaming proposals - Updated proposal template with workflow instructions - Require PR number in title format: # <PR#> - <Title> - Moved workflow instructions into comment block - Added GitHub workflow to automatically check proposal numbering - Validates both filename and title format - Updates PR description when proposal files don't match PR number - Provides exact commands to fix naming issues - Removes warning once corrected - Handles both added and renamed files - Runs on all PRs (ready for mandatory status check) - Added notification script for existing open PRs - After merge, run notify-open-prs.sh to ask authors to rebase - Workflow will automatically guide them through renaming - Updated with all current open proposal PRs (#70, #82, #83, #85, #88, #93, #94, #96, #98, #99, #100, #101, #103) Proposals 001-019 retain their original numbers. Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>

kidpollo · 2026-04-30T07:06:23Z

This feature is looking attractive for us so we can simplify our efforts around splitting clusters. Has work in this gone beyond this stage?

k-wall · 2026-05-18T16:44:20Z

I think this proposal is heading in the right direction.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

…Mapping` assumptions Signed-off-by: Tom Bentley <tbentley@redhat.com>

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Sync the proposal with implementation changes made in commits 4d40312bb and 7a1d9e777: - Replace bootstrapNodeId(route) with virtualNodeId() and anyNodeId(route) to distinguish broker-specific vs bootstrap connections - Remove route parameter from sendRequestToNode() — runtime derives it from the virtual node ID via NodeIdMapping.fromVirtual() - Add routeNames() to RouterFactoryContext for validation - Update NodeIdMapping description to reflect fromVirtual() now returns both route and target node ID - Update API_VERSIONS section to use virtualNodeId() - Add note that current implementation uses dedicated mapping only - Update design choices to reflect new API rationale Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Sync the proposal with implementation changes made since commit 5b6545d72: - Replace RouterResult sealed type with RouterResponse and builder pattern - Add respondWith(), respondWithError(), respondWithoutReply() builder methods to RouterContext following the Filter API pattern - Add CloseOrTerminalStage, TerminalStage, CloseStage interfaces for fluent response construction with optional connection closure - Add topicName(Uuid) method to RouterContext for synchronous topic ID resolution (necessary for APIs like SHARE_FETCH that lack topic names) - Update all RouterResult references to RouterResponse - Update Response interface references (now returns ApiMessage directly) - Update design choices section to reflect builder pattern and topicName() The builder pattern allows routers to construct responses via context.respondWith(body).build() instead of new RouterResult.Completed(), consistent with how filters work and allowing the API to evolve without breaking changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Updates the proposal to reflect the implemented API: - VirtualNode replaces int virtual node IDs as an opaque reference type, enabling alternative networking models (proxy-as-broker) - nodeForId(int) bridges the Kafka wire protocol to VirtualNode - RouterContext methods renamed: virtualNode(), anyNode(), sendRequest() - TopologyService: opt-in topology cache with side-effect population from METADATA responses, coarse invalidation, router-driven staleness handling - PartitionInfo and BrokerInfo for follower-fetch / AZ-aware routing - Shared node address map (per-router-level, not per-connection) - RequestSender binding for TopologyService async methods Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Coordinators are now cached via the same side-effect path as METADATA — PendingResponse carries request-side context (keyType, key) that the runtime uses alongside the FIND_COORDINATOR response to populate the TopologyCache. No explicit putCoordinator call needed from the router. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

TopologyService now has three symmetric discovery methods: - leaders() -> CompletionStage<PartitionLeaders> - coordinators() -> CompletionStage<Coordinators> - topicNames() -> CompletionStage<Map<Uuid, String>> Removes leaderOf(), coordinatorOf(), ensureLeadersCached(), and discoverCoordinator(). Documents the rationale: sync cache queries are unsafe because the shared cache can be invalidated between discovery and query. Self-contained snapshots eliminate this race. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Topic ID resolution moves from the synchronous RouterContext.topicName() to the async TopologyService.topicNames(Set<Uuid>), consistent with leaders() and coordinators(). Documents the rationale: the synchronous method silently swallowed resolution errors and was inconsistent with the async discovery pattern. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

The shared topology cache is not scoped by authenticated subject. Document why this is safe (additive/union semantics, routing-only use, backend ACL enforcement) and clarify that router authors must not use the cache for authorization decisions. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Mark references to internal runtime classes (VirtualNodeImpl, RoutingDecisionHandler, PendingResponse, CoordinatorRequestContext, TopologyCache, RouterContextImpl, TopologyServiceImpl, RouterChainFactory, MetadataAddressCacher, NodeIdResponseTranslator, RouterGraphValidator) so readers can distinguish them from public API types. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Update cache scope and request sending sections to reflect that TopologyServiceImpl is per-connection (wrapping a shared TopologyCache), eliminating the RequestSender race condition. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Clarify that SaslInitiator on per-route filter chains authenticates the proxy to the upstream cluster and does not affect authenticatedSubject(). Document the four placement rules that ensure composability and avoid pathological configurations. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

Per-route filter chains can transform topic names differently, so the same cluster-level topic ID can map to different router-visible names on different routes. When allowSharedClusterTargets is enabled, omitting the route parameter causes the cache to return whichever name was written last. Change TopologyService.topicNames(Set<Uuid>) to topicNames(String route, Set<Uuid>) and update invalidation semantics to clear topic name entries on invalidateRoute(). Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

The dedicated mapping formula V = id + S × t uses the route count S as a multiplier. Adding or removing routes changes S, shifting all virtual node IDs for existing routes. Acknowledge this explicitly in both the mapping strategy section and operational considerations, noting that it requires draining all connections and reconfiguring all proxy instances together. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tom Bentley <tbentley@redhat.com>

This was referenced Jun 7, 2025

Client certificate principal propagation to filter context. kroxylicious/kroxylicious#2270

Closed

Client ssl principal propagation kroxylicious/kroxylicious#2294

Closed

gunnarmorling reviewed Jun 14, 2025

View reviewed changes

tombentley mentioned this pull request Jul 4, 2025

Authentication API #71

Closed

prabhamanepalli reviewed Jul 15, 2025

View reviewed changes