System Design

Disclaimer

I want to focus this document on how and what I would take into consideration when I build something. This might not be in terms of what one needs to do in a System Design interview, however, the considerations would be the same, format, not so much. I'll focus on creation of Web Application in this document.

Where to start?

When creating a web application, the logic layer is something that will be unique to the usecase, however, a few core aspects remain across the board. Let's look into what those will be and what considerations you will need to take into account in each. The considerations listed are in no particular order as each aspect of it is equally important.

Scale

What scale are you targeting for?

Is this something you want to build for yourself?
Is this something you are building from scratch?
Is this a new feature for something that already has a users?
Are you targeting users in a specific country? Or is this a product accessible globally?

The answer to these questions will determine what you will need to consider with the following considerations. Because it will determine not only the technologies you need to use, but also what amount of freedom you have in choosing the systems to build it.

The scale needs to be considered with the following:

Size of all the data that will be ingested
The frequency with which the data is expected to be ingested
Are there security constraints which need to be taken into account?
Is this sensitive to time? Or can we take our time in processing the data being ingested?
Is this a live service? A LIVE service requires an entirely different approach not only from how the sytem needs to be designed but even at the protocol level for data transfer

Database

The choice here is not just restricted to choosing between SQL and NoSQL databases because the data is structured or not structured.

Why you choose a database:

Nature of the Data
Security
Speed & Scaling
Cost (Not something a lot of people consider but this is very important to consider)

If the nature of your data is structured and you require ACID, please choose SQL. Not having a good understanding of the nature of your data is not a good reason to choose NoSQL. NoSQL is not just having data in any format you need, the underlying priorities change between your choices, you do not get ACID with NoSQL. In short, please dont choose MongoDB because your data has a few parameters that are optional. Mongo costs A LOT OF MONEY to run at scale. Also SQL scales REALLY WELL! Please don't use NoSQL because it "scales easier". PostgreSQL is probably all you need for most things! Also we need to take into consideration how the cost scales with the storage of data and the capabilities of handling ingestion scale.

So when do I need NoSQL?

If you need a represent relationships and the nature of your data is inherently a graph. Use a Graph DB.
You want to create a KnowledgeBase
You don't care about ACID and the scale of the data is just not enough with ACID surity. Like comments and likes in a huge social network.
You are storing data like documents

Please read up the underlying workings of the databases and not just the repsentation of data to choose between databases. It is also important to understand what to store because the data being stored will determine what database to store. Therefore, the core issue is understanding the need for the usecase and THEN choose the database that is best for that particular need. There is no one size fits all solution.

Also consider replication and mirroring. Databases have geographical constraints as well as some countries like the EU do not allow their data to be sent to other places.

Transaction Blocks:

You can use transaction blocks to ensure a series of POST operations all succeeed or none succeed.
You will need this feature sometimes when a particular feature needs 2/3 entries in the DB and if any single one of them fails, you want all of it to fail.

Backend

Things to consider while choosing a backend:

Speed of execution
Performance requirements for the application
If it's an existing system you are building for, then there might be more restrictions because the expertise of the team might need to be taken into consideration before choosing a backend.

Python based frameworks like Django and Flask offer really good tools for quick prototyping and offer a vast array of library support to do pretty much anything you want to do. However, this comes with an inherent disadvantage of any interpreted language, performance.

Go has Gin, Javascript has NodeJS, Bun and a bunch of other frameworks. There is great support for Rust and C++ as well.

Choosing a backend is not just about choosing the fastest or the easiest framework you can think off, but needs a bit more thought about what you are building and what the expertise of your team is. This choice will also affect hiring and how expensive your talent will be.

Frontend

I can talk about the frontend with the limited experience I have with it and from what I have seen from the teams I have worked with. Performance is everything and speed of execution truly matters for teams that are iterating ideas quickly. The bundle size matters and so does how fast your webpage loads. SEO is a big contributor here as well. Event tracking on the Frontend is very important. This helps your product team understand user behavior and to figure out where to put effort next. Understand how the DOM loads and when the Javascript executes. CORS is a problem. Every operation needs to be non-blocking. Understanding how async works and what needs to be done upon the success of that async task will ensure that the user has a great experience on your frontend. Although from the POV of system design, the frontend is seldom thought about, it is still very important to think about as this is what faces your customers.

Connectivity

How do you connect your services together?

REST (Representational State Transfer): This represents how you can transfer data through this architecture style using HTTP methods like GET, POST, PUT, DELETE.
- This is how most services communicate with each other.
- The data can still be sent through many ways: JSON, URL-ENCODED, AS PARAMS, BINARY, FORM DATA
- It is important to consider what type of data you want to accept and why is it that you are using it
- The methods themselves are defined on resources and the expectation is to have an endpoint like POST /v1//, GET /v1//
- However, in practive, as REST does not provide a good way to describe actions, you will need to take liberties in how you define the endpoint
gRPC: Uses ProtoBufs to reduce the overhead that HTTP has to increase the performance of APIs. Mostly found in inter-service communications between different backend services
WebSockets: When you need bi-directional communication between the server and the client. But do note that this comes with a lot of technical over with managing the connection and retries and also establishing long-polling fallbacks.
WebHooks: Opening up your own services so that someone else can send you data whenever it is available to them. If I want something from Meta, I call a Meta API (PULL). When I want Meta to send me something when they can, I give them a webhook endpoint I have made (Push)
GraphQL: An Alternate way of representing how you request data. This helps you define exactly the format of data you want from the server so you save on data as you dont get properties you dont need.

Data Processing

This is how you derive analytics from all the OLTP data that you have received. An example here would be about how you will setup a replication from your Postgres OLTP database to a RedShift OLAP database and your data analytics team will be able to use that data to derive metrics and understand more about the user behavior. This is really important to have the replication pipeline setup and ensuring that all the required data is available to dervice analytics for business decisions. Please do not run long running analytics queries on live production databases.

Authentication & Authorization

Use signed JWT for users. You can add custom parameters to identify a user. But please ensure that your JWT is signed.
Use Hashed Keys for SDKs. Argon2 provides pretty much everything you need.

Deployment

How do you deploy?

I'm going to go out on a limb here and say you need to dockerize your applications. Are there situations where you dont need to and that's beneficial? Yes. But will those situations be something you will encounter often, I highly doubt that.
Creating containers of your applications and having a deploy script with docker-compose helps immensely in setting up a CD pipeline.
Github Actions is one way to trigger deployments
There are full fledged applications like Jenkins that help you with deployment as well.

Where should you deploy?

You can go bare metal and setup everything on your own if the application demands it.
But you always choose the simple EC2 or EC2-like solutions on any cloud provider
There are auto scaling options available as well and these must be chosen with care. Ensure that the scaling occurs based on what your application uses, is it CPU heavy? Network heavy? Memory heavy?
Your scaling must be based on metrics that affect you and not just something that works for everyone.
This also comes with its associated costs

Infrastructure

What you need

A database
Backend Application Servers
Frontend Application Servers
Build Servers
Analytics Databases
A reverse proxy for your services
A VPC where your services are
Subnets are important if you are concerned with security of certain services
Some form of Route53 kind of service that sets up your VPC with the public internet.
An event queue (If your application needs it)
A REDIS cache

Speed

Caching:

Caching is an incredible tool to improve the performance of your system.
Enabling caching on your database will allow subsequent GET queries for the same data be magnitudes faster.
Enabling caching on your API will also result in the same.
The factors you need to consider are:
- How big can you afford your cache to be?
- Can you handle the complexity of a distributed cache?
- How long should the cache live? This will be entirely based on how fast the data changes and how much space your cache has.
- How can you track cache misses and try and make your caching policy better?

Event queues:

Helps disconnect your caller and your end consumer of a data.
If your caller just wants to dump data and it needs to be FAST. Use a queue to get the data dumped and process it later.
This reduces the load on your server during times of heavy traffic, as you can accept the data now and process it later.
The usage of this needs to be understood really well! Because certain operations need confirmation before you proceed further.
The direct processingn / event queue processing can be thought of as a blocking/non-blocking type of operation.
Event queues are really powerful systems but the setup takes time and it is important to configure them right.
This system has message brokers and the task queues
The choice of how much control you want on your pipeline will determine if you want RabbitMQ or Apache Kafka.
RabbitMQ gives you a general purpose complete broker. Higher complexity but gives more features.
Apache Kafka is for streaming data.

Security

Where can you focus on security as a developer who might not be too keen on digging deeper into security:

Encrypt your data. Synchronous encryptions on properties you want to query on. Asynchronous encryption on the rest
Ensure the JWT Token validity is not too high, you can use Refresh Tokens to keep longer sessions active
Ensure that the login has 2-Factor or ideally Multi Factor Authentication. Email OTP, Phone Number Message OTP, Authenticator Apps, Okta and a lot more services can be used for this
Rotate keys
Use a secrets manager and dump it into your application servers when needed
DO NOT ADD PASSWORDS OR API KEYS IN THE ENV FILE
DO NOT ADD THESE SENSITIVE KEYS IN YOUR CHATS WITH LLMS
Keep your services hidden from the public internet
Use a reverse proxy and route the incoming requests as needed
Ensure your Database cannot be accessed by the public internet
Log EVERYTHING on your Server Shells
Create VPCs and ensure connectivity with them is restricted
Have well maintained, restricted CORS policies on your Web servers
Have quick ways for your users to notify you of breached accounts and have kill switches to lock out the perpertrators quickly from your users accounts
If you have an API that lists data, ensure it is always paginated and have strict RPS/RPM numbers. Do not allow even authenticated users to SPAM call your whole database entry one by one
Have IP Limits and User Limits on all your APIs
NEVER TAKE IN USER DATA DIRECTLY INTO A SQL QUERY. ALWAYS USE PARAMETERIZED QUERIES!
Do not expose User Email IDs when listing them in other people's queries. Only send data that is required.

Analytics

Why do you need it?

This is so important to determine the business decisions and engineering decisions.
You need to know what's happening with your system from both the engineering front and the user behavior front.

Where you do you need it?

API tracking
Frontend user behavior tracking
Infrastructure usage tracking

How do you add it?

You can definitely track it manually. But there are easier ways to do it.
Amplitude for frontend events.
New Relic for backend APIs and infrastructure.
The cloud provider should provide various services to track usage of the services you are using.

Observability

Why do you need it?

If something blows up, you really need to know why it blew up.

Where do you need it?

On your entire infrastructure.
You need to understand if ANY part of your infrastructure breaks. You need to know when it happened and more importantly, why it happened

How do you add it?

Sentry

Low Level Design

The above topics were all high level design decisions you will be taking about your product. Once these considerations are taken into account. We will need to focus on the low level design.

Where does this come into play?

Once you understand the nature of your data. You will need to describe every single entity and it's parameter.
Take consideration of PII (Personal Identifiable Information) and how you will be encrypting that data
Each API needs to have a set contract and changes to this will result in a change in the version. Decisions will need to be made to check if the new API will be backwards compatible.
The data in the database will need to be indexed. How to choose the index? Figure out what you are querying that data by in your APIs, the most frequent way you query the data needs to be indexed (Again this might not lead to increase in performance based on the nature of that index)
As your data gets bigger, you will have to worry about sharding. How do you shard the data? If you are retreving your data in a bulk, what is that bulk retreival based on? What is the data inherently divided by? Use that to shard the table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Design

Disclaimer

Where to start?

Scale

Database

Backend

Frontend

Connectivity

Data Processing

Authentication & Authorization

Deployment

Infrastructure

Speed

Security

Analytics

Observability

Low Level Design

FilesExpand file tree

SystemDesign.md

Latest commit

History

SystemDesign.md

File metadata and controls

System Design

Disclaimer

Where to start?

Scale

Database

Backend

Frontend

Connectivity

Data Processing

Authentication & Authorization

Deployment

Infrastructure

Speed

Security

Analytics

Observability

Low Level Design