Overview

system-design-pass-interview A no-nonsense guide covering all the essentials needed to pass a system design interview.

System Design Fundamentals

1. Load Balancer

A device or software that distributes incoming network traffic across multiple servers to ensure no single server becomes overwhelmed, improving availability and reliability.

2. API Gateway

A server that acts as an entry point for all client requests, managing and routing these requests to appropriate services in a microservices architecture.

3. Key Characteristics of Distributed Systems

A collection of independent computers that appear to the users as a single coherent system, characterized by scalability, fault tolerance, consistency, and availability.

4. Network Essentials

Fundamental concepts such as IP addresses, TCP/IP, routing, switching, and protocols that are crucial for designing and understanding networked systems.

5. DNS - Domain Name System

The system that translates human-readable domain names (like www.example.com) into IP addresses that computers use to identify each other on the network.

6. Caching

Storing copies of data in a temporary storage location (cache) to reduce the time and resources needed to fetch data from the original source.

7. CDN - Content Delivery Network

A network of distributed servers that deliver content to users based on their geographic location, improving access speed and reliability.

8. Data Partitioning

The process of dividing a database into smaller, manageable pieces (partitions) to improve performance, scalability, and manageability.

9. Proxies

An intermediary server that acts between a client and a destination server, often used for anonymity, security, or load balancing.

10. Redundancy and Replication

Redundancy: The duplication of critical components or functions to increase reliability and availability. Replication: The process of copying and maintaining database or system data across multiple servers or locations for redundancy and fault tolerance.

11. CAP and PACELC Theorems

CAP Theorem: In a distributed data store, it is impossible to simultaneously provide Consistency, Availability, and Partition tolerance. Only two can be achieved at the same time.

PACELC Theorem: A refinement of CAP that states if there is a Partition (P), a system must choose between Availability (A) and Consistency (C), and Else (E) during normal operation, it must choose between Latency (L) and Consistency (C).

12. Database

An organized collection of structured information or data, typically managed by a database management system (DBMS).

13. Indexes

Data structures that improve the speed of data retrieval operations on a database at the cost of additional writes and storage space.

14. Bloom Filters

A probabilistic data structure that efficiently tests whether an element is a member of a set, with a possibility of false positives but no false negatives.

15. Long Polling, WebSockets, and Server-Sent Events

Long Polling: A technique where the client requests information from the server and the server holds the request open until new information is available. WebSockets: A protocol providing full-duplex communication channels over a single TCP connection, allowing real-time data transfer. Server-Sent Events (SSE): A server push technology where the server sends updates to the client over a single, long-lived HTTP connection.

16. Quorum

The minimum number of votes or members required to make a decision or commit a transaction in distributed systems, ensuring consistency and fault tolerance.

17. Heartbeat

A signal sent at regular intervals between systems or components to indicate active status and verify connectivity.

18. Checksum

A value used to verify the integrity of data by detecting errors or corruption during transmission or storage.

19. Leader and Follower

A model in distributed systems where a designated leader node manages and coordinates the tasks while follower nodes replicate and execute the leader’s decisions.

20. Security

Measures and protocols implemented to protect systems and data from unauthorized access, attacks, and breaches, ensuring confidentiality, integrity, and availability.

21. Distributed Messaging System

A system that allows asynchronous communication between different parts of a distributed application, often using message queues.

22. Distributed File System

A file system that allows access to files across multiple servers or locations as if they were on the same local system, providing redundancy and scalability.

23. Miscellaneous Concepts

This may include various other topics such as microservices, sharding, eventual consistency, load shedding, failover, or other system design principles that don’t fall neatly into the categories above.

System Design Problems

  1. Ad Click Aggregator
  2. Dropbox
  3. LeetCode
  4. Ticketmaster
  5. TinyURL
  6. Top-K YouTube Videos
  7. Web Crawler
  8. Bigtable
  9. Cassandra
  10. Chubby
  11. Dynamo
  12. GFS
  13. HDFS
  14. Kafka

System Design Trade-offs

  1. ACID vs BASE Properties in Database
  2. API Gateway vs Direct Service Exposure
  3. API Gateway vs Reverse Proxy
  4. Batch Processing vs Stream Processing
  5. CDN Usage vs Direct Server Serving
  6. Data Compression vs Data Deduplication
  7. Hybrid Cloud Storage vs All Cloud Storage
  8. Latency vs Throughput
  9. Load Balancer vs API Gateway
  10. Polling vs Long Polling vs Webhooks
  11. Primary-Replica vs Peer-to-Peer Replication
  12. Proxy vs Reverse Proxy
  13. Read-Heavy vs Write-Heavy System
  14. Read-Through vs Write-Through Cache
  15. REST vs RPC
  16. Server-Side Caching vs Client-Side Caching
  17. Serverless Architecture vs Traditional Server-Based
  18. SQL vs NoSQL
  19. Stateful vs Stateless Architecture
  20. Strong vs Eventual Consistency
  21. Token Bucket vs Leaky Bucket