Partitioning Approaches in SQL and NoSQL: Horizontal, Vertical, Range, Hash, and List Partitioning
Horizontal, vertical, SQL, and NoSQL partitioning strategies — a map of every approach and when to use each
Abstract AlgorithmsMore actions⌄
Reading progress
12 min left
Metadata and pacing⌄
Total read
12 min
Sections
28
◴ On this page⌄
✣ Need another angle?⌄
Switch the article companion into a lower-complexity framing, then quiz yourself when you are ready.
1. Overview
Horizontal, vertical, SQL, and NoSQL partitioning strategies — a map of every approach and when to use each
Why it matters
TLDR: Partitioning splits one logical table into smaller physical pieces.
Expandable deep dives
📖 The 2-Billion-Row Problem — Why Indexes Stop Being Enough⌄
Your orders table has 2 billion rows. Every query scans gigabytes of data even with indexes. An index on 2 billion rows is still a 2-billion-entry B-Tree — and the leaf pages are scattered across the entire 400 GB tablespace. Partitioning changes the problem fundamentally: instead of one 400 GB table, you have 48 separate 8 GB tables — one per month over four years. A query for last month's orders reads only 8 GB. The other 392 GB are never opened. This is the core promise of partitioning, and it is why every major database — PostgreSQL, MySQL, Cassandra, DynamoDB, MongoDB — builds it into the data model.
- An index on 2 billion rows is still a 2 billion entry B Tree — and the leaf pages are scattered across the entire 400 GB tablespace.
- 📖 The 2 Billion Row Problem — Why Indexes Stop Being Enough Your orders table has 2 billion rows.
Partitioning vs. Sharding — The Distinction That Matters in Every System Design Interview⌄
The diagram below traces the decision path from a large table access to the correct partitioning strategy, then shows the query routing outcome under each strategy. Use it to orient a system design discussion around the right first question: what does the access pattern look like?
- The critical fork is the diamond labelled "Partition key in WHERE clause?" — every partitioning benefit depends on reaching the left branch.
- If critical queries cannot include the partition key, the strategy is wrong for the access pattern.
- TLDR: Partitioning splits one logical table into smaller physical pieces.
🔍 A Map of Every Partitioning Strategy — What Each One Splits and When to Use It⌄
The diagram below traces the decision path from a large table access to the correct partitioning strategy, then shows the query routing outcome under each strategy. Use it to orient a system design discussion around the right first question: what does the access pattern look like?
- The critical fork is the diamond labelled "Partition key in WHERE clause?" — every partitioning benefit depends on reaching the left branch.
- If critical queries cannot include the partition key, the strategy is wrong for the access pattern.
- TLDR: Partitioning splits one logical table into smaller physical pieces.
⚙️ How Range, List, Hash, and Composite Partitioning Actually Work⌄
SQL databases — PostgreSQL, MySQL, CockroachDB — expose partitioning through explicit DDL. The partition type controls how rows are assigned to partitions.
- A table range partitioned by delivers zero pruning benefit for a query that filters only by .
- TLDR: Partitioning splits one logical table into smaller physical pieces.
- NoSQL databases make partitioning emergent from your data model choices — partition key, shard key, or chunk boundary.
Key takeaways
- ✓TLDR: Partitioning splits one logical table into smaller physical pieces.
- ✓The database skips irrelevant pieces entirely — turning a 30 second full table scan into a sub second single partition read.
- ✓SQL databases declare partitions with DDL statements and prune at query plan time.
- ✓NoSQL databases make partitioning emergent from your data model choices — partition key, shard key, or chunk boundary.
Deep technical expansionOpen full authored reference⌄
TLDR: Partitioning splits one logical table into smaller physical pieces. The database skips irrelevant pieces entirely — turning a 30-second full-table scan into a sub-second single-partition read. SQL databases declare partitions with DDL statements and prune at query-plan time. NoSQL databases make partitioning emergent from your data model choices — partition key, shard key, or chunk boundary. The wrong partition key destroys these gains in both worlds. This post is the map. For SQL detail see the SQL partitioning deep-dive; for NoSQL detail see the Cassandra, DynamoDB, and MongoDB deep-dive.
📖 The 2-Billion-Row Problem — Why Indexes Stop Being Enough
Your orders table has 2 billion rows. Every query scans gigabytes of data even with indexes. An index on 2 billion rows is still a 2-billion-entry B-Tree — and the leaf pages are scattered across the entire 400 GB tablespace. Partitioning changes the problem fundamentally: instead of one 400 GB table, you have 48 separate 8 GB tables — one per month over four years. A query for last month's orders reads only 8 GB. The other 392 GB are never opened. This is the core promise of partitioning, and it is why every major database — PostgreSQL, MySQL, Cassandra, DynamoDB, MongoDB — builds it into the data model.
Partitioning vs. Sharding: Partitioning keeps data on one node (or a transparently coordinated cluster) and prunes at query time. Sharding distributes data across fully independent database nodes with shard-aware routing. The rule of thumb: if your application connects to one connection string, you are partitioning. If it connects to multiple independent databases, you are sharding. Cassandra and DynamoDB blur this — they use "partition key" vocabulary but physically distribute across nodes. In a system design interview, acknowledge the overlap explicitly.
🔍 Horizontal vs. Vertical: The Two Fundamental Ways to Split a Table
Every partitioning strategy is a variant of one of two conceptual cuts.
Horizontal partitioning divides rows into subsets based on a partition key value. The January 2024 partition holds all rows where created_at falls in that month. Each partition has identical columns but a different slice of the row space. This is what every SQL DDL PARTITION BY clause and every NoSQL partition key implements.
Vertical partitioning divides columns. A 50-column users table becomes a narrow user_core table (4 hot columns read on every login) and a user_profile table (40 cold columns loaded only on profile pages). Vertical partitioning is a table design decision, not a database DDL feature. The benefit: the authentication hot path never loads the cold profile columns, reducing I/O and improving buffer pool efficiency.
These two strategies are orthogonal and frequently combined. A wide, high-traffic table is often horizontally partitioned by time AND vertically split into hot and cold column sets.
⚙️ Range, Hash, List, and Composite: The SQL Strategy Set
SQL databases — PostgreSQL, MySQL, CockroachDB — expose partitioning through explicit DDL. The partition type controls how rows are assigned to partitions.
| Strategy | Splits by | Best for | Trade-off |
| Range | Ordered value ranges (dates, IDs) | Time-series archival, rolling-window queries | Write hotspot: all current inserts go to the latest partition |
| Hash | hash(key) % N | Even I/O distribution, preventing write hotspots | No range pruning — range predicates scan all partitions |
| List | Explicit enumerated values | Geographic regions, categorical data, compliance isolation | All partition values must be known at design time |
| Composite | Two levels: range then hash (or list) | Very large tables needing both time pruning and write balance | Higher DDL complexity; more partitions to manage |
| Horizontal | Row subsets by partition key | Any large operational table | Partition key must appear in critical WHERE clauses |
| Vertical | Column subsets into separate tables | Wide tables with hot/cold access patterns | Application must handle split-table joins explicitly |
The partition key must appear in your most frequent query predicates. A table range-partitioned by created_at delivers zero pruning benefit for a query that filters only by customer_id. Validate with EXPLAIN ANALYZE and confirm Partitions excluded > 0 before shipping.
🧠 Deep Dive: How SQL and NoSQL Partition Philosophies Differ
The Internals
SQL databases make partitioning explicit and declarative. PostgreSQL stores partition bounds in the system catalog (pg_class, pg_inherits). At query planning time, the planner evaluates each partition's bounds against the query predicate and eliminates non-matching partitions — this is constraint exclusion (pre-v10) or native partition pruning (v10+). The planner never opens a pruned partition's data files. For static predicates (literal values), pruning happens at parse time. For parameterised queries (WHERE created_at = $1), PostgreSQL v12+ supports dynamic pruning at execution time.
NoSQL databases make partitioning implicit and emergent. You do not declare partition boundaries with DDL — you choose a partition key, and the database's internal mechanism produces the partition assignment. Cassandra applies Murmur3 to your partition key value and maps the token to a node on its consistent hash ring. DynamoDB applies its own internal hash function and routes to a physical partition in its managed partition map. MongoDB uses your shard key to assign documents to chunks, then routes reads and writes via the mongos proxy. In all three, the partition key decision is made at schema creation time and is permanent.
Performance Analysis
SQL pruning gains and limits: Partition pruning delivers the most dramatic speedups when queries touch ≤10% of partitions — a 48-partition table with a month-scoped query prunes 47 of 48 partitions, reducing I/O by 98%. The gains disappear entirely when the partition key is absent from the query predicate. With more than ~500 partitions in PostgreSQL, per-partition planner overhead adds measurable latency even after aggressive pruning.
NoSQL hot partition failure mode: Every NoSQL system has the same failure mode — a partition key with insufficient cardinality concentrates disproportionate traffic on one physical partition. In Cassandra, this triggers compaction stalls and GC pauses on the owning node that bleed into adjacent partitions. In DynamoDB, it triggers ProvisionedThroughputExceededException on specific keys even with ample table-level capacity. In MongoDB, it creates a jumbo chunk the balancer cannot migrate. The mitigation is always some form of higher-cardinality key design: a composite key with a time-bucket component in Cassandra, write sharding in DynamoDB, or a compound shard key in MongoDB.
📊 Visualizing the Partitioning Decision: Strategy Selection to Query Routing
The diagram below traces the decision path from a large table access to the correct partitioning strategy, then shows the query routing outcome under each strategy. Use it to orient a system design discussion around the right first question: what does the access pattern look like?
flowchart TD
A["Large Table — Need to Partition"] --> B{"Primary access pattern?"}
B -->|"Date-range queries dominate"| C["Range Partitioning by date"]
B -->|"Point lookups by ID"| D["Hash Partitioning by entity ID"]
B -->|"Region or category filters"| E["List Partitioning by category/region"]
B -->|"Both date-range and ID lookup"| F["Composite: Range by date + Hash by ID"]
C --> G{"Partition key in WHERE clause?"}
D --> G
E --> G
F --> G
G -->|"Yes — partition key present"| H["Partition Pruning Active — skip N-1 partitions"]
G -->|"No — partition key absent"| I["Full Scan — all partitions opened (avoid this)"]
H --> J["Fast targeted read — sub-second response"]
I --> K["Slow scan — worse than unpartitioned table"]
The critical fork is the diamond labelled "Partition key in WHERE clause?" — every partitioning benefit depends on reaching the left branch. If critical queries cannot include the partition key, the strategy is wrong for the access pattern.
🌍 Real-World Applications: How Scale-Out Companies Use Partitioning
Stripe — time-series ledger (PostgreSQL range partitioning). The payments platform partitions its transaction ledger by settlement date. Monthly queries touch exactly one partition. Dropping partitions past the regulatory retention window is a millisecond DDL statement — no multi-hour DELETE with vacuuming.
Discord — Cassandra message history (NoSQL consistent hashing). Discord stores messages with a composite partition key of (channel_id, week_bucket). Each week of messages for a channel is its own partition, capping size and ensuring every "load last 50 messages" request hits exactly one Cassandra node.
Airbnb — geographic search (list partitioning). The listings table is partitioned by geographic region. A Paris apartment search queries only the EU partition, reducing the scan from 7 million rows to roughly 800,000 before any index is consulted.
Amazon retail — DynamoDB single-table design. Related entities (users, orders, order items) share one table differentiated by PK: USER#<id> and SK: ORDER#<date> conventions. All orders for a user are co-located on one partition, enabling single-query retrieval with consistent single-digit millisecond latency at any scale.
⚖️ SQL vs. NoSQL Partitioning Philosophy: Side-by-Side
| Dimension | SQL (PostgreSQL, MySQL) | NoSQL (Cassandra, DynamoDB, MongoDB) |
| Partition declaration | Explicit DDL — PARTITION BY RANGE/HASH/LIST | Implicit — emergent from data model / partition key choice |
| Partition boundaries | Declared and visible in system catalog | Managed internally; may be opaque (DynamoDB) |
| Pruning mechanism | Query planner constraint exclusion at plan time | Hash ring / partition map lookup at routing time |
| Cross-partition queries | Full support — planner merges result sets | ALLOW FILTERING / Scan / scatter-gather — use with caution |
| Schema change cost | ALTER TABLE to add/drop partitions — minutes, no data movement | Partition key change requires full table rebuild — weeks of migration |
| Rebalancing | Attach/detach partitions; data stays on same node | Chunk migration (MongoDB), internal reshard (DynamoDB), vnode rebalance (Cassandra) |
| Hot partition mitigation | Hash sub-partitioning; composite range+hash | Composite key with time-bucket; write sharding; compound shard key |
| Operational model | You manage the database server | Managed (DynamoDB) or self-hosted with operational overhead (Cassandra, MongoDB) |
🧭 Decision Guide: Which Partitioning Approach to Use
| Situation | Recommendation |
| Time-series with rolling archival (SQL) | Range partitioning by date — instant partition drop for archival |
| Even write distribution, high-cardinality key (SQL) | Hash partitioning — eliminates write hotspot |
| Geo-compliance or categorical isolation (SQL) | List partitioning by region/category + DEFAULT partition |
| Table too large for single-level strategy | Composite: range by time + hash by ID within each range |
| Distributed write throughput at massive scale | Cassandra with composite partition key; DynamoDB with well-chosen hash key |
| Geo-data residency enforcement | MongoDB zone sharding — pins shard key ranges to tagged nodes |
| Serverless, unpredictable traffic pattern | DynamoDB — managed partitioning + adaptive capacity handles spikes |
| Avoid partitioning when | Table < 50M rows; existing indexes sufficient; no natural high-cardinality partition key |
🧪 Practical Assessment: Does Your Table Actually Need Partitioning?
Before choosing a partitioning strategy, run this three-question check on your table:
1. Size test. Is the table approaching or exceeding 50–100 million rows, or 50 GB? Below this threshold, a well-chosen index almost always outperforms partitioning, with none of the added DDL complexity.
2. Partition key alignment test. Does a high-cardinality column appear in the WHERE clause of 80%+ of critical queries? If no such column exists, partitioning will add overhead without pruning benefit. Find the column first, then design the partition strategy around it.
3. Access pattern test. Are queries primarily time-bounded (range partitioning wins), randomly distributed by ID (hash partitioning wins), or filtered by a fixed category set (list partitioning wins)? A mismatch between partition type and access pattern eliminates all pruning benefit and may degrade performance below an equivalent unpartitioned table.
If all three questions return positive answers, partitioning is appropriate. If any returns negative, improve indexes first and revisit partitioning when data volume demands it.
📚 Lessons Learned: What Production Teaches You That Documentation Skips
The partition key is the most load-bearing architectural decision in your data model. Both SQL and NoSQL systems make it expensive to change: SQL requires rebuilding a partitioned table with live data migration; NoSQL requires a full schema rebuild. Commit to the partition key with the same deliberation you give to a public API contract.
Monitor partition skew from the start. Set up weekly queries reporting row counts and size per partition (SQL) or per-shard chunk counts (MongoDB) or partition sizes via JMX (Cassandra). A partition growing 10× faster than its siblings is a signal to act at 20 GB — not at 200 GB when it requires a live migration under load.
Validate pruning explicitly in CI. A refactor that wraps the partition key in a function call — WHERE DATE(created_at) = '2024-01-15' instead of WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16' — silently disables pruning. Add EXPLAIN ANALYZE assertions that check Partitions excluded > 0 for critical queries.
Partitioning and sharding are sequential steps, not competing choices. Start with a single database. Add partitioning when queries get slow on large tables. Move to sharding when data volume or write throughput genuinely exceeds a single node. Applying sharding prematurely adds operational complexity that partitioning would have solved for years at a fraction of the cost.
📌 Summary and Key Takeaways
Horizontal partitioning divides rows; vertical partitioning divides columns. Both solve real problems and are often combined in the same data model.
SQL partitioning is explicit and declarative. Range, hash, and list strategies are declared with DDL. The query planner prunes non-matching partitions automatically — but only when the partition key appears in the WHERE clause.
NoSQL partitioning is implicit and emergent. The partition key choice at schema creation time determines all future routing, rebalancing, and hot-spot risk. There is no ALTER TABLE escape hatch.
All systems share the same failure mode: a partition key that concentrates traffic on a small number of physical partitions. The diagnostic is universal. The fix is always higher cardinality — composite keys, time buckets, write sharding.
This post is the map. For the complete SQL deep-dive (range, hash, list, composite partitioning, pg_partman, pruning internals), see the SQL partitioning post. For the complete NoSQL deep-dive (Cassandra token rings, DynamoDB adaptive capacity, MongoDB chunk balancer), see the NoSQL partitioning post.
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.
Article metadata

Written by
Abstract Algorithms
@abstractalgorithms
Reader feedback
Was this article useful?
Rate it before you leave, then follow or subscribe for the next deep dive.
Related deep dives
Continue topic learning

