System Design: WhatsApp — Real-time Chat with Online/Offline
February 11, 2026 · 8 min read
How WhatsApp handles 100 billion messages per day — WebSocket connections, message delivery guarantees, offline queuing, and presence.
WhatsApp sends 100 billion messages per day to 2 billion users. The core problems: maintaining persistent connections to billions of mobile clients, guaranteeing delivery even when recipients are offline, and showing presence (online/typing) without melting your servers.
Connection Layer: WebSockets at Scale
Each client maintains a persistent WebSocket connection to a chat server. With 2B users and maybe 500M concurrently connected, you need thousands of chat server instances. Each server holds connections for ~50K-100K users in memory. A connection registry (backed by Redis or a distributed service mesh) maps user_id → which chat server holds their socket.
Message Flow: Online Recipient
// Sender → Chat Server A
// Chat Server A:
// 1. Persist message to DB (Cassandra) — durability first
// 2. Lookup: which server holds recipient's connection?
const recipientServer = await registry.getServer(recipientId)
// 3. Forward via internal gRPC to Chat Server B
await chatServerB.deliver(message)
// 4. Server B pushes to recipient's WebSocket
// 5. Recipient ACKs → Server sends delivery receipt back to senderMessage Flow: Offline Recipient
When the recipient has no active connection, the message is written to a persistent message queue (per-user inbox in Cassandra). When the user comes online, their client sends the last received message ID, and the server replays everything since then. This is why WhatsApp messages arrive in a burst when you reconnect after being offline — it's replaying the inbox queue.
Message Delivery Guarantees
- ▸One tick (✓): message received by server and persisted
- ▸Two ticks (✓✓): message delivered to recipient's device
- ▸Blue ticks: message read by recipient
- ▸Messages stored in Cassandra with (user_id, conversation_id, timestamp) composite key
- ▸At-least-once delivery with client-side deduplication via message_id
Presence: Online/Offline/Typing
Presence is expensive at scale. Naive approach: broadcast every online/offline event to all contacts. With 500M online users each having 200 contacts, that's 100 billion fan-out events per session change — impossible.
WhatsApp's approach: presence is lazy and pull-based. You only receive presence updates for contacts whose chat is currently open in your UI. The client subscribes to specific users' presence when you open their chat, and unsubscribes when you navigate away. This collapses the fan-out problem entirely.
End-to-End Encryption
- ▸Signal Protocol: each message encrypted with a unique key derived from a ratcheting key exchange
- ▸Server never has access to message plaintext — it only routes ciphertext
- ▸Key exchange happens client-to-client; server stores public keys only
- ▸Group chats: sender encrypts once per recipient (n encryptions for n members)