Francisco Silva.
This project implements a high-throughput distributed rate limiter designed to handle extreme request volumes in horizontally scaled systems.
It was built to explore a common real-world problem:
how to enforce rate limits without turning the distributed datastore into a bottleneck
Instead of performing a datastore operation on every request, this design uses:
- local aggregation
- batching
- asynchronous flush
This reduces distributed operations by orders of magnitude while keeping request-path latency low.
The implementation reflects patterns used in large-scale systems where:
- throughput is critical
- datastore operations are expensive
- latency must remain predictable
Naive rate limiting strategies often fail at scale because they rely on a distributed operation per request.
This approach becomes a bottleneck under high load.
This project explores an alternative design that trades strict accuracy for scalability and throughput.
This design is not suitable when:
- strict per-request accuracy is required
- strong consistency is mandatory
- rate limits must be globally synchronized in real-time
In such cases, simpler centralized approaches may be more appropriate.
The project focuses on demonstrating the following engineering principles:
- designing rate limiters for high-throughput environments
- reducing datastore pressure through batching
- minimizing request-path latency
- supporting high levels of concurrency
- exploring trade-offs between accuracy and scalability
The repository includes both the implementation and detailed technical documentation explaining the design decisions behind the system.
The following diagram illustrates the high-level architecture of the rate limiter.
+----------------------+
| Client API |
+----------+----------+
|
v
+----------------------+
| Rate Limiter Node |
|----------------------|
| Local Request Count |
| Batching Mechanism |
| Async Flush Worker |
+----------+-----------+
|
v
+-------------------------------+
| Distributed Key-Value Store |
|-------------------------------|
| incrementByAndExpire(key, n) |
+-------------------------------+
Request flow:
- A client sends a request to the rate limiter.
- The limiter increments a local counter for the client key.
- Requests are aggregated locally instead of immediately updating the datastore.
- A background worker flushes batched increments to the distributed datastore.
- The rate limiting decision is returned to the client.
This architecture significantly reduces the number of distributed datastore operations, allowing the system to support very high request throughput.
Traditional rate limiters often follow a pattern like this:
Request → Datastore Counter Increment → Decision
While simple, this approach does not scale well under heavy load because every request requires a distributed datastore operation.
This project instead uses a different strategy:
Request
↓
Local Counter Increment
↓
Batch Update to Datastore (asynchronous)
↓
Rate Limit Decision
By aggregating updates locally and flushing them in batches, the system drastically reduces the number of distributed datastore operations.
This allows the rate limiter to support very high request throughput while maintaining low request latency.
To make the architecture easier to understand, the documentation is organized into a sequence of focused technical documents.
Readers are encouraged to explore the repository in the following order.
High-level explanation of the architecture and how the rate limiter works internally.
→ 01-system-overview.md
Explains the problem space and the challenges of implementing rate limiting in distributed systems.
→ 02-context.md
Discusses the key architectural decisions and the trade-offs between scalability, accuracy, and consistency.
→ 03-tradeoffs.md
Describes how the system is validated using unit tests, concurrency tests, and functional verification.
→ 04-testing.md
Presents the benchmarking approach used to measure throughput and system performance under sustained load.
→ 05-benchmarking.md
Explains how the rate limiter handles concurrent requests and minimizes contention.
→ 06-concurrency.md
Explores potential optimizations and architectural extensions for large-scale production deployments.
→ 07-future-improvements.md
The project uses Maven for building and running tests.
Run the test suite with:
mvn test
The tests include:
- functional rate limiting validation
- batching efficiency verification
- concurrency stress tests
- throughput benchmarking
This project intentionally focuses on architecture and system design rather than production-ready infrastructure integration.
The goal is to demonstrate how distributed rate limiters can be designed to scale in environments where:
- request volume is extremely high
- distributed datastore operations are expensive
- minimizing latency is critical
The implementation therefore emphasizes:
- batching
- local aggregation
- asynchronous distributed updates
- concurrency-friendly design
This repository serves as a technical exploration of scalable rate limiting strategies in distributed systems.
By combining:
- local counters
- batched datastore updates
- asynchronous processing
- concurrency-aware design
the system achieves high throughput while minimizing pressure on the distributed datastore.
The accompanying documentation walks through the architectural decisions, trade-offs, testing strategy, and performance characteristics of the system.
Engineering study and implementation by Francisco Silva.
This project demonstrates the design and implementation of a high-throughput distributed rate limiter optimized for horizontally scaled systems.
The primary goal of this implementation is to explore architectural strategies that allow rate limiting to scale under extremely high request volumes while minimizing pressure on the distributed datastore.
Instead of performing a datastore update for every request, the system uses local aggregation, batching, and asynchronous updates to significantly reduce the number of distributed operations.
This design reflects patterns commonly used in large-scale distributed infrastructure.
The project focuses on demonstrating the following engineering principles:
- designing rate limiters for high-throughput environments
- reducing datastore pressure through batching
- minimizing request-path latency
- supporting high levels of concurrency
- exploring trade-offs between accuracy and scalability
The repository includes both the implementation and detailed technical documentation explaining the design decisions behind the system.
The following diagram illustrates the high-level architecture of the rate limiter.
+----------------------+
| Client API |
+----------+----------+
|
v
+----------------------+
| Rate Limiter Node |
|----------------------|
| Local Request Count |
| Batching Mechanism |
| Async Flush Worker |
+----------+-----------+
|
v
+-------------------------------+
| Distributed Key-Value Store |
|-------------------------------|
| incrementByAndExpire(key, n) |
+-------------------------------+
Request flow:
- A client sends a request to the rate limiter.
- The limiter increments a local counter for the client key.
- Requests are aggregated locally instead of immediately updating the datastore.
- A background worker flushes batched increments to the distributed datastore.
- The rate limiting decision is returned to the client.
This architecture significantly reduces the number of distributed datastore operations, allowing the system to support very high request throughput.
Traditional rate limiters often follow a pattern like this:
Request → Datastore Counter Increment → Decision
While simple, this approach does not scale well under heavy load because every request requires a distributed datastore operation.
This project instead uses a different strategy:
Request
↓
Local Counter Increment
↓
Batch Update to Datastore (asynchronous)
↓
Rate Limit Decision
By aggregating updates locally and flushing them in batches, the system drastically reduces the number of distributed datastore operations.
This allows the rate limiter to support very high request throughput while maintaining low request latency.
To make the architecture easier to understand, the documentation is organized into a sequence of focused technical documents.
Readers are encouraged to explore the repository in the following order.
High-level explanation of the architecture and how the rate limiter works internally.
→ 01-system-overview.md
Explains the problem space and the challenges of implementing rate limiting in distributed systems.
→ 02-context.md
Discusses the key architectural decisions and the trade-offs between scalability, accuracy, and consistency.
→ 03-tradeoffs.md
Describes how the system is validated using unit tests, concurrency tests, and functional verification.
→ 04-testing.md
Presents the benchmarking approach used to measure throughput and system performance under sustained load.
→ 05-benchmarking.md
Explains how the rate limiter handles concurrent requests and minimizes contention.
→ 06-concurrency.md
Explores potential optimizations and architectural extensions for large-scale production deployments.
→ 07-future-improvements.md
The project uses Maven for building and running tests.
Run the test suite with:
mvn test
The tests include:
- functional rate limiting validation
- batching efficiency verification
- concurrency stress tests
- throughput benchmarking
This project intentionally focuses on architecture and system design rather than production-ready infrastructure integration.
The goal is to demonstrate how distributed rate limiters can be designed to scale in environments where:
- request volume is extremely high
- distributed datastore operations are expensive
- minimizing latency is critical
The implementation therefore emphasizes:
- batching
- local aggregation
- asynchronous distributed updates
- concurrency-friendly design
This repository serves as a technical exploration of scalable rate limiting strategies in distributed systems.
By combining:
- local counters
- batched datastore updates
- asynchronous processing
- concurrency-aware design
the system achieves high throughput while minimizing pressure on the distributed datastore.
The accompanying documentation walks through the architectural decisions, trade-offs, testing strategy, and performance characteristics of the system.