<strong>Sharding</strong> is splitting a large database into smaller pieces (shards) distributed across multiple servers. Each shard contains a subset of the data.
The Library Branches
Like splitting a huge library into branches by alphabet, sharding splits data across servers.
Shard A
Books A-M
Shard B
Books N-Z
Router
Knows which branch
Find user data
Directs to right shard
Has this user's data
Choose Shard Key
Decide how to split data (by user ID, location, etc.)
Distribute Data
Split data across multiple database servers
Route Requests
Application knows which shard has the data
Query Shard
Only query the relevant shard, not all data
Return Result
Much faster since searching smaller dataset
Wrong
Sharding is the same as replication
Correct
Replication copies the same data to multiple servers (for backup/speed). Sharding splits different data across servers (for scale). They solve different problems.
Instagram shards by user ID:
Shard 1: Users with ID 1-1,000,000
Shard 2: Users with ID 1,000,001-2,000,000
Shard 3: Users with ID 2,000,001-3,000,000
Your ID is 1,500,000 → Always goes to Shard 2
Billions of users, but each shard handles manageable size