System Design: Photo Sharing — CDN, Image Processing, and Feed Ranking
January 8, 2026 · 8 min read
How Instagram-style photo sharing works under the hood — upload pipeline, image processing, CDN distribution, and ML-based feed ranking.
Instagram processes 100 million photo uploads per day. Each upload triggers a processing pipeline that generates multiple resized variants, stores them on a CDN, and eventually surfaces the photo in millions of followers' feeds ranked by a machine learning model.
Upload Pipeline
The upload flow is async by design. The client uploads the raw image directly to object storage (S3) via a pre-signed URL — bypassing your API servers entirely for the heavy byte transfer. The API server receives only metadata (user_id, caption, tags) and emits a ProcessImage event to Kafka. The client shows a 'processing' state until the pipeline completes.
// 1. Client requests upload URL
const { uploadUrl, photoId } = await api.post('/photos/initiate')
// 2. Client uploads directly to S3 (API servers never touch the bytes)
await fetch(uploadUrl, { method: 'PUT', body: imageFile })
// 3. Client notifies API that upload is complete
await api.post(`/photos/${photoId}/complete`, { caption })
// 4. API publishes event — processing happens asynchronously
kafka.publish('photo.uploaded', { photoId, s3Key, userId })Image Processing Pipeline
Worker services consume from the photo.uploaded Kafka topic. For each photo they run: format normalization (HEIC → JPEG/WebP), resize to standard variants (thumbnail 150px, feed 640px, full 1080px, 4K for zoom), content moderation (ML model flags NSFW content), and metadata extraction (EXIF stripping, dominant color extraction for placeholder blur). Each step is idempotent and can be retried independently.
- ▸Thumbnail (150×150): profile grids, search results
- ▸Feed (640px wide): main feed display
- ▸Standard (1080px): full-screen view
- ▸WebP variants: 30-40% smaller than JPEG at same quality for modern browsers
- ▸Blur hash: 30-byte placeholder generated for instant perceived loading
CDN Distribution
Processed images are stored in S3 and distributed via CloudFront (or Fastly). CDN edge nodes cache images geographically close to users. Cache keys include the variant size — so /photos/abc123/640 and /photos/abc123/1080 are separate cache entries. CDN hit rates for popular photos reach 99%+ — the origin (S3) barely gets touched.
Feed Ranking
Raw chronological feeds were replaced by ranked feeds (Instagram did this in 2016). The ranking pipeline generates a candidate set of ~500 posts from accounts you follow, then scores each with an ML model using features: recency, past engagement with this creator, content type affinity (do you engage more with Reels or photos?), post velocity (how fast is this post accumulating likes?), and relationship strength.
- ▸Candidate generation: fanout-on-write for normal accounts, pull for celebrities
- ▸Scoring: lightweight GBDT or neural net inference on feature vectors
- ▸Feature store: precomputed user and post features in Redis for low-latency lookup
- ▸Exploration: ~10% of feed slots are 'explore' content — outside your follow graph
- ▸Re-ranking: after ML score, apply diversity rules (no 3 consecutive posts from same user)