What is the difference between horizontal and vertical scaling?

Vertical Scaling upgrades a single server with more CPU/RAM but hits physical hardware limits. Horizontal Scaling adds more servers to distribute load, providing theoretically infinite scalability and fault tolerance.

What is the Circuit Breaker Pattern?

If a downstream service repeatedly fails, the circuit trips and fails fast without attempting further calls. This prevents cascading failures and gives the broken service time to recover.

PerfectNotes

Interview Prep

Fundamentals Networking Partitioning Consensus System Design

Top 50 Distributed Systems Interview Questions with Answers (2026): Backend Dev to Systems Architect

Q: What is Consistent Hashing?

Consistent Hashing places servers and keys on a conceptual hash ring. When a server is added or removed, only k/N keys need to be moved to the adjacent node, unlike standard hashing which requires a massive remapping of all data.

Q: What is the Saga Pattern?

A Saga breaks a distributed transaction into a sequence of local transactions. If any local transaction fails, compensating transactions automatically undo the preceding steps — the modern alternative to 2PC for microservices.

Q: What topics are most asked in distributed systems interviews?

CAP Theorem, Consistent Hashing, Database Sharding, Saga vs 2PC, Raft consensus, Circuit Breaker, Load Balancing (L4 vs L7), Bloom Filters, Distributed Tracing, and the Thundering Herd problem.

PerfectNotes TeamUpdated: March 2026~25 min read50 Questions5 CategoriesFree

Top 50 Distributed Systems Interview Questions banner

These 50 Distributed Systems interview questions span every high-frequency topic — from CAP Theorem, PACELC, consistent hashing, and database sharding to advanced concepts like Raft consensus, Saga patterns, Byzantine fault tolerance, Bloom filters, and CRDTs — with “Why Interviewers Ask This” insight for every answer.

Contents

1.
Fundamentals & Scaling (Q1–Q10)CAP · PACELC · Monolith vs Microservices · API Gateway · Idempotency · Statelessness
2.
Communication & Networking (Q11–Q20)gRPC · Load Balancer L4/L7 · Message Queues · Service Mesh · WebSockets · Rate Limiting
3.
Data, State & Partitioning (Q21–Q30)Sharding · Consistent Hashing · Replication · Lamport Clocks · Quorum · Split-Brain
4.
Fault Tolerance & Consensus (Q31–Q40)2PC · Saga · Raft vs Paxos · Circuit Breaker · BFT · Gossip Protocol · Bulkhead
5.
System Design & Advanced Mechanics (Q41–Q50)Bloom Filter · Merkle Trees · Snowflake ID · MapReduce · CDN · CRDTs · etcd
6.
Common Interview MistakesCAP theorem misuse · Ignoring network partitions · Consistency model confusion · Clock synchronization
7.
Expert Interview StrategyTrade-off framing · Concrete system examples · Failure scenarios · Consistency models
8.
Real-World Job ApplicationsBackend Engineer · SRE / Platform Engineer · Distributed Systems Engineer

Fundamentals & Scaling Interview Questions (Q1–Q10)

What is a Distributed System?

A distributed system is a collection of independent, physically separated computational nodes that communicate over a network to appear to the end-user as a single, cohesive system. They work together to achieve a common goal, offering higher availability and scalability than single-machine systems.

💡 Why Interviewers Ask This: The baseline question. Emphasize the "illusion of a single system" to show you understand the end-user perspective.

What is the difference between Vertical and Horizontal Scaling?

Vertical Scaling (Scaling Up): upgrading a single server with more CPU/RAM — bounded by physical hardware limits. Horizontal Scaling (Scaling Out): adding more servers to distribute load — providing theoretically infinite scalability and fault tolerance.

💡 Why Interviewers Ask This: Tests your fundamental architectural approach. Modern distributed systems rely entirely on horizontal scaling to handle massive web traffic.

What is the CAP Theorem?

The CAP Theorem states a distributed data store can only guarantee two of three simultaneously: Consistency (all nodes see the same data), Availability (every request receives a response), and Partition Tolerance (system operates despite network drops). Since partitions are unavoidable, systems must choose between C and A.

💡 Why Interviewers Ask This: The most famous theorem in system design. Because network partitions (P) are unavoidable, you must prove you understand the C vs A trade-off.

What is the PACELC Theorem?

PACELC extends CAP: during a Partition (P), choose between Availability (A) and Consistency (C). Else (E) in normal operation, choose between Latency (L) and Consistency (C). DynamoDB sacrifices consistency for low latency even without a network failure.

💡 Why Interviewers Ask This: Shows you have modern knowledge beyond just CAP. It explains real-world database trade-offs more accurately.

What are the Fallacies of Distributed Computing?

False assumptions novice developers make: The network is reliable, Latency is zero, Bandwidth is infinite, and The topology does not change. Distributed systems fail constantly — good engineers design expecting failure.

💡 Why Interviewers Ask This: Proves a mature engineering mindset. Distributed systems fail constantly; good engineers design expecting failure.

Stateful vs. Stateless Architecture

A Stateful architecture requires the server to remember client session data between requests. A Stateless architecture treats every request as completely independent — any required state (like a JWT) is passed by the client with each request.

💡 Why Interviewers Ask This: Statelessness is a prerequisite for horizontal scalability. If a server is stateless, any node can handle any request safely.

What is Transparency in Distributed Systems?

Transparency conceals component separation so the system appears as a single entity. Types: Location Transparency (users do not know where the resource is), Failure Transparency (users do not see crashes), and Replication Transparency (users are unaware of copies).

💡 Why Interviewers Ask This: Tests understanding of the ultimate goal of distributed design: hiding operational complexity from the client.

What is a Monolith vs. Microservices Architecture?

A Monolith tightly couples all business logic, UI, and database access into one deployable codebase. Microservices decouple domains into independently deployable services communicating over APIs — offering isolated scaling but immense operational complexity.

💡 Why Interviewers Ask This: You must articulate the trade-offs. Microservices solve organizational scaling but create distributed system headaches.

Explain Idempotency

An operation is idempotent if executing it multiple times yields the exact same final state as once. HTTP PUT and DELETE are idempotent; POST is not.

💡 Why Interviewers Ask This: Network requests fail frequently. If a client retries a payment due to a timeout, idempotency ensures they are not charged twice.

What is an API Gateway?

An API Gateway acts as a single entry point into a microservices architecture, handling cross-cutting concerns: request routing, authentication, rate limiting, and payload aggregation.

💡 Why Interviewers Ask This: API Gateways are mandatory in microservice architectures to prevent clients from tracking dozens of different service IPs.

Communication & Networking Interview Questions (Q11–Q20)

What is RPC (Remote Procedure Call)?

RPC allows a program to cause a procedure to execute on a remote server as if it were a local function call, completely hiding the network interaction from the developer.

💡 Why Interviewers Ask This: The historical foundation of inter-service communication and the basis of modern gRPC.

Compare gRPC to REST

REST uses HTTP/1.1 and JSON — human-readable but bulky. gRPC uses HTTP/2 and Protocol Buffers (Protobuf), transmitting data as compressed binary and supporting bi-directional streaming — significantly faster for internal microservice communication.

💡 Why Interviewers Ask This: Differentiates candidates who keep up with modern enterprise tech stacks. gRPC is the industry standard for backend-to-backend communication.

What is the difference between a Forward Proxy and a Reverse Proxy?

A Forward Proxy sits in front of clients, intercepting outbound requests (protecting clients). A Reverse Proxy (like Nginx) sits in front of servers, intercepting incoming requests (protecting and load balancing servers).

💡 Why Interviewers Ask This: A fundamental networking test. Load balancers and API Gateways are built entirely on reverse proxy architecture.

What is a Load Balancer and what are L4 vs. L7?

A Load Balancer distributes incoming traffic across multiple servers. Layer 4 (L4) routes based on network data (IPs/TCP ports) — fast and dumb. Layer 7 (L7) inspects application data (HTTP headers, URLs, cookies) — smart, granular routing for microservices.

💡 Why Interviewers Ask This: System design interviews require placing load balancers correctly. Know when to use fast dumb routing (L4) vs. smart routing (L7).

Message Queues vs. Pub/Sub Systems

In a Message Queue (like RabbitMQ), a message is consumed by exactly one worker — ideal for task distribution. In a Pub/Sub system (like Kafka), a message is published to a topic and broadcast to all subscribers simultaneously.

💡 Why Interviewers Ask This: Tests your ability to architect asynchronous, event-driven systems correctly based on business requirements.

What is a Service Mesh?

A Service Mesh (like Istio) deploys Sidecar Proxies alongside every microservice to handle observability, mutual TLS (mTLS) encryption, retries, and circuit breaking — without altering application code.

💡 Why Interviewers Ask This: Service mesh is a trending enterprise topic. It solves the operational nightmares of managing hundreds of microservices.

Long Polling vs. WebSockets

Long Polling holds an HTTP request open until new data arrives, then closes it (resource-heavy). WebSockets establish a persistent, full-duplex TCP connection for continuous, low-latency bi-directional data streaming.

💡 Why Interviewers Ask This: Essential for real-time applications — chat, live sports scores, trading dashboards.

What is the Thundering Herd Problem (Cache Stampede)?

When a popular cached item expires, thousands of concurrent requests simultaneously query the backend database to regenerate it, potentially crashing it. Solutions: request coalescing and probabilistic early expiration (PER).

💡 Why Interviewers Ask This: Proves you know how distributed caching fails under extreme scale.

What is Rate Limiting and name a common algorithm?

Rate Limiting restricts requests per user per timeframe to prevent abuse or DDoS. The Token Bucket algorithm: a bucket holds tokens replenished at a constant rate; every request costs a token; if empty, the request is dropped or queued.

💡 Why Interviewers Ask This: Tests API security and system defense capabilities.

What is Distributed Tracing?

A single user request may touch 15 services. Distributed tracing attaches a unique Correlation ID to the request, passing it to every downstream service. Tools like Jaeger or Zipkin aggregate these IDs to visualize bottlenecks across the entire flow.

💡 Why Interviewers Ask This: You cannot debug an error in a distributed system by looking at isolated server logs.

Data, State & Partitioning Interview Questions (Q21–Q30)

What is Database Sharding?

Sharding breaks a massive database into smaller shards distributed across multiple servers. Each shard holds a specific subset of data determined by a Shard Key. Enables horizontal scaling of databases beyond single-machine limits.

💡 Why Interviewers Ask This: Mandatory for systems holding petabytes of data like Twitter or Facebook.

What is Consistent Hashing?

Standard hashing (hash(key) % N) breaks catastrophically when N changes, requiring full remapping. Consistent Hashing places servers and keys on a hash ring — when a server is added/removed, only k/N keys need to move to the adjacent node.

💡 Why Interviewers Ask This: The most frequently asked load-balancing algorithm in FAANG system design interviews.

Synchronous vs. Asynchronous Replication

Synchronous: primary waits for replica acknowledgment before confirming to client — high durability, high latency. Asynchronous: primary confirms immediately and updates replica in background — low latency, risk of data loss if primary crashes.

💡 Why Interviewers Ask This: Tests your ability to balance data safety against application performance.

Strong Consistency vs. Eventual Consistency

Strong Consistency: after a write, any subsequent read from any node returns the updated value — sacrifices speed. Eventual Consistency: nodes may be temporarily out-of-sync but converge to the same value if no new writes occur — highly available.

💡 Why Interviewers Ask This: The core CAP trade-off. Relational databases lean Strong; Cassandra leans Eventual.

What is Read-After-Write Consistency?

A guarantee that once a user submits an update (e.g., posting a tweet), their very next read will immediately reflect that update. Other users may experience eventual consistency, but the creator never sees stale data.

💡 Why Interviewers Ask This: Focuses on UX within a highly distributed backend — users must see their own writes immediately.

Why is Time Synchronization difficult in Distributed Systems?

Physical server clocks experience Clock Drift over time, making it impossible to rely on physical timestamps to determine the exact global order of events across different machines.

💡 Why Interviewers Ask This: If you do not understand clock drift, you will corrupt distributed databases during write conflicts.

What are Logical Clocks (Lamport Clocks)?

Logical Clocks use integer counters to track causality. Every node increments its counter on an event; when nodes communicate they sync: counter = max(local, received) + 1, establishing a happened-before relationship.

💡 Why Interviewers Ask This: Proves deep theoretical understanding of distributed event ordering without relying on physical time.

What are Vector Clocks?

Vector Clocks maintain an array of counters — one per node. This allows detecting exact causal relationships and identifying concurrent updates (conflicts) requiring resolution — used in DynamoDB for versioning.

💡 Why Interviewers Ask This: Crucial for understanding how distributed databases like DynamoDB handle complex versioning and conflict resolution.

What is a Distributed Quorum?

A Quorum is the minimum number of nodes that must agree for an operation to succeed. For strong consistency: R + W > N (Read Quorum + Write Quorum must overlap, where N = total nodes). The mathematical basis of Cassandra's consistency levels.

💡 Why Interviewers Ask This: The mathematical formula enforcing consistency guarantees in peer-to-peer databases.

Explain the Split-Brain Problem

Split-brain occurs during a network partition when both cluster halves lose contact and independently elect a leader, leading to conflicting writes and data corruption. Prevented using an odd number of nodes to ensure strict Quorum voting.

💡 Why Interviewers Ask This: Split-brain is one of the most dangerous distributed system failure modes — testing your ability to prevent it demonstrates engineering maturity.

Fault Tolerance & Consensus Interview Questions (Q31–Q40)

What is the Two-Phase Commit (2PC) Protocol?

2PC coordinates distributed transactions. Phase 1 (Prepare): coordinator asks all participants to prepare to commit. Phase 2 (Commit/Rollback): if all agree, commits; otherwise rolls back. Critical flaw: it is a blocking protocol — if the coordinator crashes, nodes lock up indefinitely.

💡 Why Interviewers Ask This: You must point out its massive flaw: a dead coordinator leaves all participants permanently locked.

What is the Saga Pattern?

A Saga breaks a distributed transaction into a sequence of local transactions. If any step fails, the saga executes Compensating Transactions to undo preceding steps. The industry standard for distributed data integrity (e.g., booking a flight + hotel simultaneously).

💡 Why Interviewers Ask This: You cannot do SQL transactions across microservices. Sagas are the industry standard.

Orchestration vs. Choreography in Sagas

Choreography: services publish events to a message broker and react independently — decoupled, but hard to track. Orchestration: a centralized Saga Execution Coordinator explicitly commands participating services — easier to monitor but single point of failure.

💡 Why Interviewers Ask This: Tests ability to architect complex state machines based on operational business constraints.

What is a Consensus Algorithm?

An algorithm allowing distributed, unreliable machines to agree on a single, universal state or value, even if some machines crash. The foundation of Kafka, Zookeeper, and Kubernetes leader election.

💡 Why Interviewers Ask This: Consensus is the beating heart of distributed coordination.

Compare Paxos and Raft

Both achieve distributed consensus with fault tolerance. Paxos is mathematically proven but notoriously hard to implement correctly. Raft was designed explicitly for understandability, cleanly dividing consensus into Leader Election, Log Replication, and Safety.

💡 Why Interviewers Ask This: Raft powers etcd (Kubernetes) and Consul. Know why the industry abandoned Paxos for Raft in production systems.

Explain the Circuit Breaker Pattern

Circuit Breaker wraps remote calls. If a downstream service repeatedly fails, the circuit trips and fails fast without further attempts. This prevents cascading failures and gives the broken service time to recover.

💡 Why Interviewers Ask This: Critical resiliency pattern — shows you understand how to protect networks from retry storms on a dead server.

What is Byzantine Fault Tolerance (BFT)?

Standard fault tolerance assumes nodes simply crash. BFT addresses scenarios where nodes act maliciously or send conflicting information to different network parts. Crucial for blockchain and aerospace systems operating in trustless environments.

💡 Why Interviewers Ask This: Critical for blockchain technologies where trustless nodes must achieve consensus safely.

What is the Gossip Protocol?

A decentralized protocol where nodes randomly share cluster state with neighboring nodes. Like a real-world rumor, information spreads exponentially without a central coordinator. Used by Amazon Dynamo and Cassandra for cluster health detection.

💡 Why Interviewers Ask This: Shows advanced knowledge of cluster management and epidemic information spreading algorithms.

What is the Bulkhead Pattern?

Inspired by a ship's watertight compartments, Bulkhead partitions system resources (connection pools, CPU threads) so that if one microservice fails and exhausts its resources, it cannot consume the entire system's resources.

💡 Why Interviewers Ask This: Proves you know how to build "blast-radius" contained systems that fail gracefully.

What is Exponential Backoff and Jitter?

Exponential Backoff exponentially increases wait time between retries (1s → 2s → 4s). Jitter adds randomized variance to prevent all waiting clients from retrying at the exact same millisecond — otherwise retries become a self-inflicted DDoS attack.

💡 Why Interviewers Ask This: An absolute must-know for stable cloud applications.

System Design & Advanced Mechanics Interview Questions (Q41–Q50)

What is a Bloom Filter?

A space-efficient probabilistic data structure testing set membership. Returns "Possibly in set" (allows false positives) or "Definitely not in set" (zero false negatives). Used to avoid expensive database lookups for items known not to exist.

💡 Why Interviewers Ask This: A top-tier algorithm question proving you understand memory optimization for high-speed cache checks.

What are Merkle Trees (Hash Trees)?

A tree where every leaf holds the cryptographic hash of a data block; non-leaf nodes hold hashes of their children. Used in Cassandra and blockchain for Anti-Entropy — efficiently comparing massive datasets and syncing only the broken chunks.

💡 Why Interviewers Ask This: Tests deep knowledge of how distributed databases repair themselves in the background seamlessly.

How do you generate globally Unique IDs in a Distributed System?

Auto-incrementing DB IDs create bottlenecks. The standard is Twitter Snowflake ID — a 64-bit integer combining: timestamp + machine ID + local sequence number. Generates sortable, collision-free IDs at massive scale without central coordination.

💡 Why Interviewers Ask This: Guaranteed to appear in system design interviews for chat and social media applications.

What is the MapReduce Framework?

A Google-designed programming model for processing massive datasets across distributed clusters. Map: filters and sorts data into key-value pairs. Reduce: aggregates and summarizes those pairs. Abstracts away the complex networking of parallel processing.

💡 Why Interviewers Ask This: Tests foundational knowledge of Big Data processing and Apache Hadoop architecture.

What is a CDN (Content Delivery Network)?

A CDN is a geographically distributed network of proxy servers that cache static assets at Edge Locations close to the end user, drastically reducing latency, decreasing bandwidth costs, and shielding the origin server from traffic spikes.

💡 Why Interviewers Ask This: You cannot design a scalable global application like Netflix or YouTube without a CDN.

Explain Leader Election in Distributed Systems

Nodes elect a Leader to make unilateral decisions (assigning tasks), while others act as Followers. If the leader dies, a consensus algorithm (like Raft) triggers a new election to promote a follower with the most up-to-date log.

💡 Why Interviewers Ask This: Critical for cluster orchestration and avoiding race conditions on shared distributed resources.

What is Apache Zookeeper / etcd?

Highly reliable distributed coordination services maintaining cluster configuration, providing distributed locking, and managing leader election. They act as the central source of truth for Kafka and Kubernetes clusters.

💡 Why Interviewers Ask This: Proves you understand the infrastructure required to manage large-scale distributed deployments.

What are CRDTs (Conflict-free Replicated Data Types)?

Advanced data structures allowing data to be updated independently and concurrently across nodes without locks or central coordination, guaranteeing all nodes will mathematically converge to the same state — used in Google Docs collaborative editing.

💡 Why Interviewers Ask This: A cutting-edge expert topic showing you understand mathematics beyond simple leader-based replication.

Cache Aside vs. Write-Through Caching

Cache Aside: application checks cache; on miss, queries DB and updates cache manually. Write-Through: application writes to cache, and cache synchronously writes to DB — perfect consistency at the cost of higher write latency.

💡 Why Interviewers Ask This: Evaluates ability to design efficient, consistent caching layers based on read/write ratios.

What is Distributed Shared Memory (DSM)?

An architecture where multiple physically separate machines share a virtual memory space. To the application it appears as local RAM, but the OS handles complex networking to fetch memory pages from remote nodes over the network.

💡 Why Interviewers Ask This: A highly advanced OS/systems concept testing knowledge of hardware-level virtualization and memory paging.

Common Mistakes in Distributed Systems Interviews

Treating CAP theorem as "pick any 2": Network partitions are inevitable, not optional. The real choice is CP vs AP during a partition (Q3). Saying "we pick CA" is an immediate red flag that signals textbook-only knowledge.
Confusing eventual consistency with "data loss" (Q5): Eventual consistency guarantees convergence — all replicas willagree, just not instantly. Saying it means "data might be lost" shows you have not understood the consistency spectrum from strong to eventual.
Saying "just add more servers" for scaling (Q2): Horizontal scaling introduces coordination overhead, data partitioning challenges, and consistency trade-offs. Interviewers want to hear about consistent hashing (Q7), sharding strategies (Q8), and replication (Q6) — not a one-liner.
Mixing up Raft and Paxos (Q13, Q14): Raft was designed to be understandable — it uses a strong leader and sequential log replication. Paxos is leaderless and more general. Confusing them signals you memorised names without understanding the algorithms.
Ignoring network partitions in failure scenarios (Q11): Byzantine failures, split-brain, and message loss are not edge cases — they are the default in distributed systems. Answering fault tolerance questions without mentioning partition handling is incomplete.
Oversimplifying consistent hashing as "just a hash ring" (Q7): Without virtual nodes, consistent hashing leads to severe load imbalance. Interviewers expect you to mention vnodes, replication factor placement, and how systems like Cassandra and DynamoDB implement it.

Expert Interview Strategy for Distributed Systems Roles

Always lead with trade-offs, not absolutes.Every distributed systems answer has trade-offs. "Consistent hashing reduces rebalancing from O(K) to O(K/N) — but requires virtual nodes for load balance." Showing nuance separates you from candidates who memorise definitions.
Draw the architecture before you speak. Distributed systems are inherently visual. Sketch nodes, arrows, and failure boundaries. Interviewers at FAANG expect whiteboard diagrams for questions on replication topologies, consensus flows, and partitioning strategies.
Name real systems, not just concepts."Kafka for event streaming", "Cassandra for AP storage", "ZooKeeper or etcd for coordination", "Raft in etcd, Multi-Paxos in Spanner." Concrete system knowledge differentiates you from theory-only candidates.
Connect every answer to CAP or PACELC.When asked about any database or system, immediately classify it: "Cassandra is AP under CAP, PA/EL under PACELC — it favours availability and low latency over consistency." This framing shows systematic thinking.
Discuss failure modes first, happy paths second.In distributed systems, the interesting behaviour happens during failures. Start with "when a node crashes…" or "during a network partition…" — this is what interviewers actually want to hear.

How These Concepts Apply in Real Distributed Systems Jobs

Backend Engineer

Implements database sharding strategies (Q8), designs event-driven architectures with Kafka (Q21), handles distributed transactions with Saga pattern (Q15), and configures consistent hashing (Q7) for cache clusters. CAP trade-offs (Q3) drive every storage design decision.

Site Reliability Engineer

Monitors replication lag (Q6), implements circuit breakers (Q17) and bulkheads (Q18) to contain blast radius, manages consensus clusters (Q13–Q14), and runs chaos engineering experiments (Q20) to validate fault tolerance. Gossip protocols (Q19) power cluster health checks.

Platform / Infra Engineer

Builds service meshes with load balancing (Q9), designs leader election systems (Q12), implements distributed ID generators like Snowflake (Q25), and manages cross-region replication. Vector clocks (Q10) and CRDTs (Q16) solve conflict resolution at scale.

Conclusion: Master Distributed Systems Interviews

These 50 distributed systems interview questions cover the essential concepts you will encounter in backend engineer, site reliability engineer, platform engineer, and systems architect roles. Mastering these topics demonstrates a solid understanding of scalability patterns, fault tolerance mechanisms, consensus algorithms, and data consistency models.

The key to interview success is not just knowing the answers, but understanding the "why" behind each question. Each answer includes insights into what interviewers are testing — from foundational knowledge like CAP theorem to practical decision-making around sharding, replication, and failure handling.

After reviewing these answers, reinforce your learning by exploring System Design and Computer Networks interview questions. The combination of distributed systems theory + system design practice + networking fundamentals creates the strongest foundation for senior engineering interviews.

Topics covered in this guide

Topics in this guide: CAP theorem, PACELC theorem, horizontal vs vertical scaling, stateless architecture, API gateways, idempotency, monolith vs microservices, RPC and gRPC vs REST, forward vs reverse proxy, L4 vs L7 load balancing, message queues vs pub/sub (Kafka vs RabbitMQ), service mesh and mTLS, WebSockets vs long polling, Thundering Herd problem, rate limiting (Token Bucket), distributed tracing, database sharding and shard keys, consistent hashing, synchronous vs asynchronous replication, strong vs eventual consistency, read-after-write consistency, Lamport clocks, vector clocks, quorum voting (R+W>N), split-brain problem, Two-Phase Commit (2PC), Saga pattern (orchestration vs choreography), consensus algorithms (Raft vs Paxos), circuit breaker, Byzantine fault tolerance, Gossip protocol, Bloom filters, Merkle trees, Snowflake ID, MapReduce, CDNs, CRDTs, exponential backoff with jitter.

For freshers: CAP theorem (C, A, P definitions), horizontal vs vertical scaling, API gateway concept, stateless vs stateful, load balancer types (L4/L7), message queue vs pub/sub, circuit breaker pattern basics, consistent hashing ring concept.

For experienced professionals: PACELC theorem analysis, Raft leader election and log replication, 2PC blocking problem and Saga compensating transactions, CRDT mathematical convergence guarantees, Byzantine fault tolerance for blockchain, Merkle tree anti-entropy for database repair, Bloom filter false positive probability calculations, vector clock conflict detection in DynamoDB.

Interview preparation tips: Every FAANG system design round tests CAP theorem — always add that network partitions are unavoidable so the real choice is C vs A. Consistent Hashing is mandatory for any sharding question. Know Saga vs 2PC trade-offs for distributed transactions. Practice drawing the Raft leader election flow on a whiteboard.

Frequently Asked Questions

Q.What Distributed Systems topics are most asked in FAANG interviews?

CAP Theorem, Consistent Hashing, Database Sharding, Saga Pattern vs 2PC, Raft consensus, Circuit Breaker, Load Balancing (L4 vs L7), Bloom Filters, Distributed Tracing, and the Thundering Herd problem.

Q.What is the difference between 2PC and Saga?

2PC uses a blocking coordinator protocol that locks nodes if the coordinator crashes — unacceptable in microservices. Sagas use compensating transactions and are non-blocking, making them the industry standard for distributed data integrity.

Q.How do you handle failures in distributed systems?

Key patterns: Circuit Breaker (fail fast), Exponential Backoff with Jitter (retry safely), Bulkhead (contain blast radius), Gossip Protocol (detect failures), and Quorum voting (maintain consensus despite node failures).

Q.Is CAP Theorem still relevant in 2026?

Yes — but the PACELC theorem extends it more accurately. In practice, databases choose Partition Tolerance (a given) and then trade off between Consistency and Availability. PACELC adds the Latency vs Consistency trade-off for the non-partition case.

Q.How many weeks to prepare for distributed systems interviews?

4–6 weeks: Week 1: scaling, CAP/PACELC, load balancing. Week 2: message queues, gRPC, proxies, service mesh. Week 3: sharding, consistent hashing, replication, clocks. Week 4: 2PC, Sagas, Raft, fault tolerance patterns. Weeks 5–6: Bloom filters, Merkle trees, CRDTs, system design practice.

Ready to test your knowledge?

Try System Design Interview Questions ·All Interview Guides

Found these questions helpful? Share them with your peers.

Common Interview Mistakes

Errors that eliminate candidates

Giving textbook definitions without showing a concrete this subject use case.
Skipping trade-offs and answering as if there is only one correct engineering decision.
Over-answering for 2-3 minutes without structure, metrics, or outcomes.

Expert Interview Strategy

30-second answer rule

Start with a one-line definition, then explain one real scenario from this subject.
Use a 3-step structure: concept, practical example, and interviewer intent.
Close with one trade-off (performance, scale, security, or maintainability).

Real-World Job Applications

These this subject patterns are directly tested for production roles where interviewers expect clear debugging steps, architecture trade-offs, and communication under time pressure.

Conclusion

Mastering these this subject interview questions means explaining concepts quickly, connecting them to real systems, and justifying decisions with practical trade-offs.

Frequently Asked Questions

How should I prepare this topic in 7 days? Focus on high-frequency patterns, rehearse 30-second answers, and revise one practical example per category.

What do interviewers score most? Clarity, structured thinking, and your ability to reason through constraints and trade-offs.

Related Resources

Browse Theory Notes Explore Interview Hubs

Computer Networks Questions

Cloud Computing Questions

Interview Prep

Fundamentals Networking Partitioning Consensus System Design

Top 50 Distributed Systems Interview Questions with Answers (2026): Backend Dev to Systems Architect

PerfectNotes TeamUpdated: March 2026~25 min read50 Questions5 CategoriesFree

Contents

1.
Fundamentals & Scaling (Q1–Q10)CAP · PACELC · Monolith vs Microservices · API Gateway · Idempotency · Statelessness
2.
Communication & Networking (Q11–Q20)gRPC · Load Balancer L4/L7 · Message Queues · Service Mesh · WebSockets · Rate Limiting
3.
Data, State & Partitioning (Q21–Q30)Sharding · Consistent Hashing · Replication · Lamport Clocks · Quorum · Split-Brain
4.
Fault Tolerance & Consensus (Q31–Q40)2PC · Saga · Raft vs Paxos · Circuit Breaker · BFT · Gossip Protocol · Bulkhead
5.
System Design & Advanced Mechanics (Q41–Q50)Bloom Filter · Merkle Trees · Snowflake ID · MapReduce · CDN · CRDTs · etcd
6.
Common Interview MistakesCAP theorem misuse · Ignoring network partitions · Consistency model confusion · Clock synchronization
7.
Expert Interview StrategyTrade-off framing · Concrete system examples · Failure scenarios · Consistency models
8.
Real-World Job ApplicationsBackend Engineer · SRE / Platform Engineer · Distributed Systems Engineer

Fundamentals & Scaling Interview Questions (Q1–Q10)

What is a Distributed System?

💡 Why Interviewers Ask This: The baseline question. Emphasize the "illusion of a single system" to show you understand the end-user perspective.

What is the difference between Vertical and Horizontal Scaling?

💡 Why Interviewers Ask This: Tests your fundamental architectural approach. Modern distributed systems rely entirely on horizontal scaling to handle massive web traffic.

What is the CAP Theorem?

💡 Why Interviewers Ask This: The most famous theorem in system design. Because network partitions (P) are unavoidable, you must prove you understand the C vs A trade-off.

What is the PACELC Theorem?

💡 Why Interviewers Ask This: Shows you have modern knowledge beyond just CAP. It explains real-world database trade-offs more accurately.

What are the Fallacies of Distributed Computing?

💡 Why Interviewers Ask This: Proves a mature engineering mindset. Distributed systems fail constantly; good engineers design expecting failure.

Stateful vs. Stateless Architecture

💡 Why Interviewers Ask This: Statelessness is a prerequisite for horizontal scalability. If a server is stateless, any node can handle any request safely.

What is Transparency in Distributed Systems?

💡 Why Interviewers Ask This: Tests understanding of the ultimate goal of distributed design: hiding operational complexity from the client.

What is a Monolith vs. Microservices Architecture?

💡 Why Interviewers Ask This: You must articulate the trade-offs. Microservices solve organizational scaling but create distributed system headaches.

Explain Idempotency

An operation is idempotent if executing it multiple times yields the exact same final state as once. HTTP PUT and DELETE are idempotent; POST is not.

💡 Why Interviewers Ask This: Network requests fail frequently. If a client retries a payment due to a timeout, idempotency ensures they are not charged twice.

What is an API Gateway?

An API Gateway acts as a single entry point into a microservices architecture, handling cross-cutting concerns: request routing, authentication, rate limiting, and payload aggregation.

💡 Why Interviewers Ask This: API Gateways are mandatory in microservice architectures to prevent clients from tracking dozens of different service IPs.

Communication & Networking Interview Questions (Q11–Q20)

What is RPC (Remote Procedure Call)?

RPC allows a program to cause a procedure to execute on a remote server as if it were a local function call, completely hiding the network interaction from the developer.

💡 Why Interviewers Ask This: The historical foundation of inter-service communication and the basis of modern gRPC.

Compare gRPC to REST

💡 Why Interviewers Ask This: Differentiates candidates who keep up with modern enterprise tech stacks. gRPC is the industry standard for backend-to-backend communication.

What is the difference between a Forward Proxy and a Reverse Proxy?

💡 Why Interviewers Ask This: A fundamental networking test. Load balancers and API Gateways are built entirely on reverse proxy architecture.

What is a Load Balancer and what are L4 vs. L7?

💡 Why Interviewers Ask This: System design interviews require placing load balancers correctly. Know when to use fast dumb routing (L4) vs. smart routing (L7).

Message Queues vs. Pub/Sub Systems

💡 Why Interviewers Ask This: Tests your ability to architect asynchronous, event-driven systems correctly based on business requirements.

What is a Service Mesh?

💡 Why Interviewers Ask This: Service mesh is a trending enterprise topic. It solves the operational nightmares of managing hundreds of microservices.

Long Polling vs. WebSockets

💡 Why Interviewers Ask This: Essential for real-time applications — chat, live sports scores, trading dashboards.

What is the Thundering Herd Problem (Cache Stampede)?

💡 Why Interviewers Ask This: Proves you know how distributed caching fails under extreme scale.

What is Rate Limiting and name a common algorithm?

💡 Why Interviewers Ask This: Tests API security and system defense capabilities.

What is Distributed Tracing?

💡 Why Interviewers Ask This: You cannot debug an error in a distributed system by looking at isolated server logs.

Data, State & Partitioning Interview Questions (Q21–Q30)

What is Database Sharding?

💡 Why Interviewers Ask This: Mandatory for systems holding petabytes of data like Twitter or Facebook.

What is Consistent Hashing?

💡 Why Interviewers Ask This: The most frequently asked load-balancing algorithm in FAANG system design interviews.

Synchronous vs. Asynchronous Replication

💡 Why Interviewers Ask This: Tests your ability to balance data safety against application performance.

Strong Consistency vs. Eventual Consistency

💡 Why Interviewers Ask This: The core CAP trade-off. Relational databases lean Strong; Cassandra leans Eventual.

What is Read-After-Write Consistency?

💡 Why Interviewers Ask This: Focuses on UX within a highly distributed backend — users must see their own writes immediately.

Why is Time Synchronization difficult in Distributed Systems?

Physical server clocks experience Clock Drift over time, making it impossible to rely on physical timestamps to determine the exact global order of events across different machines.

💡 Why Interviewers Ask This: If you do not understand clock drift, you will corrupt distributed databases during write conflicts.

What are Logical Clocks (Lamport Clocks)?

💡 Why Interviewers Ask This: Proves deep theoretical understanding of distributed event ordering without relying on physical time.

What are Vector Clocks?

💡 Why Interviewers Ask This: Crucial for understanding how distributed databases like DynamoDB handle complex versioning and conflict resolution.

What is a Distributed Quorum?

💡 Why Interviewers Ask This: The mathematical formula enforcing consistency guarantees in peer-to-peer databases.

Explain the Split-Brain Problem

💡 Why Interviewers Ask This: Split-brain is one of the most dangerous distributed system failure modes — testing your ability to prevent it demonstrates engineering maturity.

Fault Tolerance & Consensus Interview Questions (Q31–Q40)

What is the Two-Phase Commit (2PC) Protocol?

💡 Why Interviewers Ask This: You must point out its massive flaw: a dead coordinator leaves all participants permanently locked.

What is the Saga Pattern?

💡 Why Interviewers Ask This: You cannot do SQL transactions across microservices. Sagas are the industry standard.

Orchestration vs. Choreography in Sagas

💡 Why Interviewers Ask This: Tests ability to architect complex state machines based on operational business constraints.

What is a Consensus Algorithm?

💡 Why Interviewers Ask This: Consensus is the beating heart of distributed coordination.

Compare Paxos and Raft

💡 Why Interviewers Ask This: Raft powers etcd (Kubernetes) and Consul. Know why the industry abandoned Paxos for Raft in production systems.

Explain the Circuit Breaker Pattern

💡 Why Interviewers Ask This: Critical resiliency pattern — shows you understand how to protect networks from retry storms on a dead server.

What is Byzantine Fault Tolerance (BFT)?

💡 Why Interviewers Ask This: Critical for blockchain technologies where trustless nodes must achieve consensus safely.

What is the Gossip Protocol?

💡 Why Interviewers Ask This: Shows advanced knowledge of cluster management and epidemic information spreading algorithms.

What is the Bulkhead Pattern?

💡 Why Interviewers Ask This: Proves you know how to build "blast-radius" contained systems that fail gracefully.

What is Exponential Backoff and Jitter?

💡 Why Interviewers Ask This: An absolute must-know for stable cloud applications.

System Design & Advanced Mechanics Interview Questions (Q41–Q50)

What is a Bloom Filter?

💡 Why Interviewers Ask This: A top-tier algorithm question proving you understand memory optimization for high-speed cache checks.

What are Merkle Trees (Hash Trees)?

💡 Why Interviewers Ask This: Tests deep knowledge of how distributed databases repair themselves in the background seamlessly.

How do you generate globally Unique IDs in a Distributed System?

💡 Why Interviewers Ask This: Guaranteed to appear in system design interviews for chat and social media applications.

What is the MapReduce Framework?

💡 Why Interviewers Ask This: Tests foundational knowledge of Big Data processing and Apache Hadoop architecture.

What is a CDN (Content Delivery Network)?

💡 Why Interviewers Ask This: You cannot design a scalable global application like Netflix or YouTube without a CDN.

Explain Leader Election in Distributed Systems

💡 Why Interviewers Ask This: Critical for cluster orchestration and avoiding race conditions on shared distributed resources.

What is Apache Zookeeper / etcd?

💡 Why Interviewers Ask This: Proves you understand the infrastructure required to manage large-scale distributed deployments.

What are CRDTs (Conflict-free Replicated Data Types)?

💡 Why Interviewers Ask This: A cutting-edge expert topic showing you understand mathematics beyond simple leader-based replication.

Cache Aside vs. Write-Through Caching

💡 Why Interviewers Ask This: Evaluates ability to design efficient, consistent caching layers based on read/write ratios.

What is Distributed Shared Memory (DSM)?

💡 Why Interviewers Ask This: A highly advanced OS/systems concept testing knowledge of hardware-level virtualization and memory paging.

Common Mistakes in Distributed Systems Interviews

Treating CAP theorem as "pick any 2": Network partitions are inevitable, not optional. The real choice is CP vs AP during a partition (Q3). Saying "we pick CA" is an immediate red flag that signals textbook-only knowledge.
Confusing eventual consistency with "data loss" (Q5): Eventual consistency guarantees convergence — all replicas willagree, just not instantly. Saying it means "data might be lost" shows you have not understood the consistency spectrum from strong to eventual.
Saying "just add more servers" for scaling (Q2): Horizontal scaling introduces coordination overhead, data partitioning challenges, and consistency trade-offs. Interviewers want to hear about consistent hashing (Q7), sharding strategies (Q8), and replication (Q6) — not a one-liner.
Mixing up Raft and Paxos (Q13, Q14): Raft was designed to be understandable — it uses a strong leader and sequential log replication. Paxos is leaderless and more general. Confusing them signals you memorised names without understanding the algorithms.
Ignoring network partitions in failure scenarios (Q11): Byzantine failures, split-brain, and message loss are not edge cases — they are the default in distributed systems. Answering fault tolerance questions without mentioning partition handling is incomplete.
Oversimplifying consistent hashing as "just a hash ring" (Q7): Without virtual nodes, consistent hashing leads to severe load imbalance. Interviewers expect you to mention vnodes, replication factor placement, and how systems like Cassandra and DynamoDB implement it.

Expert Interview Strategy for Distributed Systems Roles

Always lead with trade-offs, not absolutes.Every distributed systems answer has trade-offs. "Consistent hashing reduces rebalancing from O(K) to O(K/N) — but requires virtual nodes for load balance." Showing nuance separates you from candidates who memorise definitions.
Draw the architecture before you speak. Distributed systems are inherently visual. Sketch nodes, arrows, and failure boundaries. Interviewers at FAANG expect whiteboard diagrams for questions on replication topologies, consensus flows, and partitioning strategies.
Name real systems, not just concepts."Kafka for event streaming", "Cassandra for AP storage", "ZooKeeper or etcd for coordination", "Raft in etcd, Multi-Paxos in Spanner." Concrete system knowledge differentiates you from theory-only candidates.
Connect every answer to CAP or PACELC.When asked about any database or system, immediately classify it: "Cassandra is AP under CAP, PA/EL under PACELC — it favours availability and low latency over consistency." This framing shows systematic thinking.
Discuss failure modes first, happy paths second.In distributed systems, the interesting behaviour happens during failures. Start with "when a node crashes…" or "during a network partition…" — this is what interviewers actually want to hear.

How These Concepts Apply in Real Distributed Systems Jobs

Backend Engineer

Site Reliability Engineer

Platform / Infra Engineer

Conclusion: Master Distributed Systems Interviews

Topics covered in this guide

Frequently Asked Questions

Q.What Distributed Systems topics are most asked in FAANG interviews?

Q.What is the difference between 2PC and Saga?

Q.How do you handle failures in distributed systems?

Q.Is CAP Theorem still relevant in 2026?

Q.How many weeks to prepare for distributed systems interviews?

Ready to test your knowledge?

Try System Design Interview Questions ·All Interview Guides

Found these questions helpful? Share them with your peers.

Common Interview Mistakes

Errors that eliminate candidates

Giving textbook definitions without showing a concrete this subject use case.
Skipping trade-offs and answering as if there is only one correct engineering decision.
Over-answering for 2-3 minutes without structure, metrics, or outcomes.

Expert Interview Strategy

30-second answer rule

Start with a one-line definition, then explain one real scenario from this subject.
Use a 3-step structure: concept, practical example, and interviewer intent.
Close with one trade-off (performance, scale, security, or maintainability).

Real-World Job Applications

These this subject patterns are directly tested for production roles where interviewers expect clear debugging steps, architecture trade-offs, and communication under time pressure.

Conclusion

Mastering these this subject interview questions means explaining concepts quickly, connecting them to real systems, and justifying decisions with practical trade-offs.

Frequently Asked Questions

How should I prepare this topic in 7 days? Focus on high-frequency patterns, rehearse 30-second answers, and revise one practical example per category.

What do interviewers score most? Clarity, structured thinking, and your ability to reason through constraints and trade-offs.

Related Resources

Browse Theory Notes Explore Interview Hubs

Computer Networks Questions

Cloud Computing Questions

Top 50 Distributed Systems Interview Questions with Answers (2026): Backend Dev to Systems Architect

Fundamentals & Scaling Interview Questions (Q1–Q10)

Q1 What is a Distributed System?

Q2 What is the difference between Vertical and Horizontal Scaling?

Q3 What is the CAP Theorem?

Q4 What is the PACELC Theorem?

Q5 What are the Fallacies of Distributed Computing?

Q6 Stateful vs. Stateless Architecture

Q7 What is Transparency in Distributed Systems?

Q8 What is a Monolith vs. Microservices Architecture?

Q9 Explain Idempotency

Q10 What is an API Gateway?

Communication & Networking Interview Questions (Q11–Q20)

Q11 What is RPC (Remote Procedure Call)?

Q12 Compare gRPC to REST

Q13 What is the difference between a Forward Proxy and a Reverse Proxy?

Q14 What is a Load Balancer and what are L4 vs. L7?

Q15 Message Queues vs. Pub/Sub Systems

Q16 What is a Service Mesh?

Q17 Long Polling vs. WebSockets

Q18 What is the Thundering Herd Problem (Cache Stampede)?

Q19 What is Rate Limiting and name a common algorithm?

Q20 What is Distributed Tracing?

Data, State & Partitioning Interview Questions (Q21–Q30)

Q21 What is Database Sharding?

Q22 What is Consistent Hashing?

Q23 Synchronous vs. Asynchronous Replication

Q24 Strong Consistency vs. Eventual Consistency

Q25 What is Read-After-Write Consistency?

Q26 Why is Time Synchronization difficult in Distributed Systems?

Q27 What are Logical Clocks (Lamport Clocks)?

Q28 What are Vector Clocks?

Q29 What is a Distributed Quorum?

Q30 Explain the Split-Brain Problem

Fault Tolerance & Consensus Interview Questions (Q31–Q40)

Q31 What is the Two-Phase Commit (2PC) Protocol?

Q32 What is the Saga Pattern?

Q33 Orchestration vs. Choreography in Sagas

Q34 What is a Consensus Algorithm?

Q35 Compare Paxos and Raft

Q36 Explain the Circuit Breaker Pattern

Q37 What is Byzantine Fault Tolerance (BFT)?

Q38 What is the Gossip Protocol?

Q39 What is the Bulkhead Pattern?

Q40 What is Exponential Backoff and Jitter?

System Design & Advanced Mechanics Interview Questions (Q41–Q50)

Q41 What is a Bloom Filter?

Q42 What are Merkle Trees (Hash Trees)?

Q43 How do you generate globally Unique IDs in a Distributed System?

Q44 What is the MapReduce Framework?

Q45 What is a CDN (Content Delivery Network)?

Q46 Explain Leader Election in Distributed Systems

Q47 What is Apache Zookeeper / etcd?

Q48 What are CRDTs (Conflict-free Replicated Data Types)?

Q49 Cache Aside vs. Write-Through Caching

Q50 What is Distributed Shared Memory (DSM)?

Common Mistakes in Distributed Systems Interviews

Expert Interview Strategy for Distributed Systems Roles

How These Concepts Apply in Real Distributed Systems Jobs

Conclusion: Master Distributed Systems Interviews

Frequently Asked Questions

Common Interview Mistakes

Expert Interview Strategy

Real-World Job Applications

Conclusion

Frequently Asked Questions

Related Resources

Top 50 Distributed Systems Interview Questions with Answers (2026): Backend Dev to Systems Architect

Fundamentals & Scaling Interview Questions (Q1–Q10)

Q1 What is a Distributed System?

Q2 What is the difference between Vertical and Horizontal Scaling?

Q3 What is the CAP Theorem?

Q4 What is the PACELC Theorem?

Q5 What are the Fallacies of Distributed Computing?

Q6 Stateful vs. Stateless Architecture

Q7 What is Transparency in Distributed Systems?

Q8 What is a Monolith vs. Microservices Architecture?

Q9 Explain Idempotency

Q10 What is an API Gateway?

Communication & Networking Interview Questions (Q11–Q20)

What is a Distributed System?

What is the difference between Vertical and Horizontal Scaling?

What is the CAP Theorem?

What is the PACELC Theorem?

What are the Fallacies of Distributed Computing?

Stateful vs. Stateless Architecture

What is Transparency in Distributed Systems?

What is a Monolith vs. Microservices Architecture?

Explain Idempotency

What is an API Gateway?

What is RPC (Remote Procedure Call)?

Compare gRPC to REST

What is the difference between a Forward Proxy and a Reverse Proxy?

What is a Load Balancer and what are L4 vs. L7?

Message Queues vs. Pub/Sub Systems

What is a Service Mesh?

Long Polling vs. WebSockets

What is the Thundering Herd Problem (Cache Stampede)?

What is Rate Limiting and name a common algorithm?

What is Distributed Tracing?

What is Database Sharding?

What is Consistent Hashing?

Synchronous vs. Asynchronous Replication

Strong Consistency vs. Eventual Consistency

What is Read-After-Write Consistency?

Why is Time Synchronization difficult in Distributed Systems?

What are Logical Clocks (Lamport Clocks)?

What are Vector Clocks?

What is a Distributed Quorum?

Explain the Split-Brain Problem

What is the Two-Phase Commit (2PC) Protocol?

What is the Saga Pattern?

Orchestration vs. Choreography in Sagas

What is a Consensus Algorithm?

Compare Paxos and Raft

Explain the Circuit Breaker Pattern

What is Byzantine Fault Tolerance (BFT)?

What is the Gossip Protocol?

What is the Bulkhead Pattern?

What is Exponential Backoff and Jitter?

What is a Bloom Filter?

What are Merkle Trees (Hash Trees)?

How do you generate globally Unique IDs in a Distributed System?

What is the MapReduce Framework?

What is a CDN (Content Delivery Network)?

Explain Leader Election in Distributed Systems

What is Apache Zookeeper / etcd?

What are CRDTs (Conflict-free Replicated Data Types)?

Cache Aside vs. Write-Through Caching

What is Distributed Shared Memory (DSM)?

What is a Distributed System?

What is the difference between Vertical and Horizontal Scaling?

What is the CAP Theorem?

What is the PACELC Theorem?

What are the Fallacies of Distributed Computing?

Stateful vs. Stateless Architecture

What is Transparency in Distributed Systems?

What is a Monolith vs. Microservices Architecture?

Explain Idempotency

What is an API Gateway?

What is RPC (Remote Procedure Call)?

Compare gRPC to REST

What is the difference between a Forward Proxy and a Reverse Proxy?

What is a Load Balancer and what are L4 vs. L7?

Message Queues vs. Pub/Sub Systems

What is a Service Mesh?

Long Polling vs. WebSockets

What is the Thundering Herd Problem (Cache Stampede)?

What is Rate Limiting and name a common algorithm?

What is Distributed Tracing?

What is Database Sharding?

What is Consistent Hashing?

Synchronous vs. Asynchronous Replication

Strong Consistency vs. Eventual Consistency

What is Read-After-Write Consistency?

Why is Time Synchronization difficult in Distributed Systems?

What are Logical Clocks (Lamport Clocks)?

What are Vector Clocks?

What is a Distributed Quorum?

Explain the Split-Brain Problem