Distributed High Throughput Rate Limiter

Author

Francisco Silva.

This project implements a high-throughput distributed rate limiter designed to handle extreme request volumes in horizontally scaled systems.

It was built to explore a common real-world problem:

how to enforce rate limits without turning the distributed datastore into a bottleneck

Instead of performing a datastore operation on every request, this design uses:

local aggregation
batching
asynchronous flush

This reduces distributed operations by orders of magnitude while keeping request-path latency low.

The implementation reflects patterns used in large-scale systems where:

throughput is critical
datastore operations are expensive
latency must remain predictable

Why this matters

Naive rate limiting strategies often fail at scale because they rely on a distributed operation per request.

This approach becomes a bottleneck under high load.

This project explores an alternative design that trades strict accuracy for scalability and throughput.

When NOT to use this approach

This design is not suitable when:

strict per-request accuracy is required
strong consistency is mandatory
rate limits must be globally synchronized in real-time

In such cases, simpler centralized approaches may be more appropriate.

Project Goals

The project focuses on demonstrating the following engineering principles:

designing rate limiters for high-throughput environments
reducing datastore pressure through batching
minimizing request-path latency
supporting high levels of concurrency
exploring trade-offs between accuracy and scalability

The repository includes both the implementation and detailed technical documentation explaining the design decisions behind the system.

High-Level Architecture

The following diagram illustrates the high-level architecture of the rate limiter.

                +----------------------+
                |      Client API     |
                +----------+----------+
                           |
                           v
                +----------------------+
                |  Rate Limiter Node   |
                |----------------------|
                | Local Request Count  |
                | Batching Mechanism   |
                | Async Flush Worker   |
                +----------+-----------+
                           |
                           v
             +-------------------------------+
             | Distributed Key-Value Store   |
             |-------------------------------|
             | incrementByAndExpire(key, n)  |
             +-------------------------------+

Request flow:

A client sends a request to the rate limiter.
The limiter increments a local counter for the client key.
Requests are aggregated locally instead of immediately updating the datastore.
A background worker flushes batched increments to the distributed datastore.
The rate limiting decision is returned to the client.

This architecture significantly reduces the number of distributed datastore operations, allowing the system to support very high request throughput.

Quick Overview

Traditional rate limiters often follow a pattern like this:

Request → Datastore Counter Increment → Decision

While simple, this approach does not scale well under heavy load because every request requires a distributed datastore operation.

This project instead uses a different strategy:

Request
   ↓
Local Counter Increment
   ↓
Batch Update to Datastore (asynchronous)
   ↓
Rate Limit Decision

By aggregating updates locally and flushing them in batches, the system drastically reduces the number of distributed datastore operations.

This allows the rate limiter to support very high request throughput while maintaining low request latency.

Repository Guide

To make the architecture easier to understand, the documentation is organized into a sequence of focused technical documents.

Readers are encouraged to explore the repository in the following order.

1. System Overview

High-level explanation of the architecture and how the rate limiter works internally.

→ 01-system-overview.md

2. System Context

Explains the problem space and the challenges of implementing rate limiting in distributed systems.

→ 02-context.md

3. Engineering Trade-offs

Discusses the key architectural decisions and the trade-offs between scalability, accuracy, and consistency.

→ 03-tradeoffs.md

4. Testing Strategy

Describes how the system is validated using unit tests, concurrency tests, and functional verification.

→ 04-testing.md

5. Performance Benchmarking

Presents the benchmarking approach used to measure throughput and system performance under sustained load.

→ 05-benchmarking.md

6. Concurrency Model

Explains how the rate limiter handles concurrent requests and minimizes contention.

→ 06-concurrency.md

7. Future Improvements

Explores potential optimizations and architectural extensions for large-scale production deployments.

→ 07-future-improvements.md

Running the Project

The project uses Maven for building and running tests.

Run the test suite with:

mvn test

The tests include:

functional rate limiting validation
batching efficiency verification
concurrency stress tests
throughput benchmarking

Design Philosophy

This project intentionally focuses on architecture and system design rather than production-ready infrastructure integration.

The goal is to demonstrate how distributed rate limiters can be designed to scale in environments where:

request volume is extremely high
distributed datastore operations are expensive
minimizing latency is critical

The implementation therefore emphasizes:

batching
local aggregation
asynchronous distributed updates
concurrency-friendly design

Summary

This repository serves as a technical exploration of scalable rate limiting strategies in distributed systems.

By combining:

local counters
batched datastore updates
asynchronous processing
concurrency-aware design

the system achieves high throughput while minimizing pressure on the distributed datastore.

The accompanying documentation walks through the architectural decisions, trade-offs, testing strategy, and performance characteristics of the system.

Author

Engineering study and implementation by Francisco Silva.

This project demonstrates the design and implementation of a high-throughput distributed rate limiter optimized for horizontally scaled systems.

The primary goal of this implementation is to explore architectural strategies that allow rate limiting to scale under extremely high request volumes while minimizing pressure on the distributed datastore.

Instead of performing a datastore update for every request, the system uses local aggregation, batching, and asynchronous updates to significantly reduce the number of distributed operations.

This design reflects patterns commonly used in large-scale distributed infrastructure.

Project Goals

The project focuses on demonstrating the following engineering principles:

designing rate limiters for high-throughput environments
reducing datastore pressure through batching
minimizing request-path latency
supporting high levels of concurrency
exploring trade-offs between accuracy and scalability

The repository includes both the implementation and detailed technical documentation explaining the design decisions behind the system.

High-Level Architecture

The following diagram illustrates the high-level architecture of the rate limiter.

                +----------------------+
                |      Client API     |
                +----------+----------+
                           |
                           v
                +----------------------+
                |  Rate Limiter Node   |
                |----------------------|
                | Local Request Count  |
                | Batching Mechanism   |
                | Async Flush Worker   |
                +----------+-----------+
                           |
                           v
             +-------------------------------+
             | Distributed Key-Value Store   |
             |-------------------------------|
             | incrementByAndExpire(key, n)  |
             +-------------------------------+

Request flow:

A client sends a request to the rate limiter.
The limiter increments a local counter for the client key.
Requests are aggregated locally instead of immediately updating the datastore.
A background worker flushes batched increments to the distributed datastore.
The rate limiting decision is returned to the client.

This architecture significantly reduces the number of distributed datastore operations, allowing the system to support very high request throughput.

Quick Overview

Traditional rate limiters often follow a pattern like this:

Request → Datastore Counter Increment → Decision

While simple, this approach does not scale well under heavy load because every request requires a distributed datastore operation.

This project instead uses a different strategy:

Request
   ↓
Local Counter Increment
   ↓
Batch Update to Datastore (asynchronous)
   ↓
Rate Limit Decision

By aggregating updates locally and flushing them in batches, the system drastically reduces the number of distributed datastore operations.

This allows the rate limiter to support very high request throughput while maintaining low request latency.

Repository Guide

To make the architecture easier to understand, the documentation is organized into a sequence of focused technical documents.

Readers are encouraged to explore the repository in the following order.

1. System Overview

High-level explanation of the architecture and how the rate limiter works internally.

→ 01-system-overview.md

2. System Context

Explains the problem space and the challenges of implementing rate limiting in distributed systems.

→ 02-context.md

3. Engineering Trade-offs

Discusses the key architectural decisions and the trade-offs between scalability, accuracy, and consistency.

→ 03-tradeoffs.md

4. Testing Strategy

Describes how the system is validated using unit tests, concurrency tests, and functional verification.

→ 04-testing.md

5. Performance Benchmarking

Presents the benchmarking approach used to measure throughput and system performance under sustained load.

→ 05-benchmarking.md

6. Concurrency Model

Explains how the rate limiter handles concurrent requests and minimizes contention.

→ 06-concurrency.md

7. Future Improvements

Explores potential optimizations and architectural extensions for large-scale production deployments.

→ 07-future-improvements.md

Running the Project

The project uses Maven for building and running tests.

Run the test suite with:

mvn test

The tests include:

functional rate limiting validation
batching efficiency verification
concurrency stress tests
throughput benchmarking

Design Philosophy

This project intentionally focuses on architecture and system design rather than production-ready infrastructure integration.

The goal is to demonstrate how distributed rate limiters can be designed to scale in environments where:

request volume is extremely high
distributed datastore operations are expensive
minimizing latency is critical

The implementation therefore emphasizes:

batching
local aggregation
asynchronous distributed updates
concurrency-friendly design

Summary

This repository serves as a technical exploration of scalable rate limiting strategies in distributed systems.

By combining:

local counters
batched datastore updates
asynchronous processing
concurrency-aware design

the system achieves high throughput while minimizing pressure on the distributed datastore.

The accompanying documentation walks through the architectural decisions, trade-offs, testing strategy, and performance characteristics of the system.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
01-system-overview.md		01-system-overview.md
02-context.md		02-context.md
03-tradeoffs.md		03-tradeoffs.md
04-testing.md		04-testing.md
05-benchmarking.md		05-benchmarking.md
06-concurrency.md		06-concurrency.md
07-future-improvements.md		07-future-improvements.md
README.md		README.md
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

Distributed High Throughput Rate Limiter

Author

Why this matters

When NOT to use this approach

Project Goals

High-Level Architecture

Quick Overview

Repository Guide

1. System Overview

2. System Context

3. Engineering Trade-offs

4. Testing Strategy

5. Performance Benchmarking

6. Concurrency Model

7. Future Improvements

Running the Project

Design Philosophy

Summary

Author

Project Goals

High-Level Architecture

Quick Overview

Repository Guide

1. System Overview

2. System Context

3. Engineering Trade-offs

4. Testing Strategy

5. Performance Benchmarking

6. Concurrency Model

7. Future Improvements

Running the Project

Design Philosophy

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages