B-Tree & B+ Tree: Multiway Balanced Search Trees

B-Trees and B+ Trees are multiway balanced search trees (generalizations of binary search trees) designed for efficient disk-based storage systems (e.g., databases, file systems). Unlike binary trees (which have 2 children per node), B-Trees/B+ Trees have multiple children per node (called “order-k” trees), reducing the number of disk I/O operations—critical for performance in systems where data is stored on slow disks (vs. fast RAM).

Core Background: Why B-Trees/B+ Trees?

Standard binary trees (BSTs, AVL Trees) are inefficient for disk-based data because:

They have a high height (O(log₂n)), leading to many disk reads (each node access = 1 disk I/O).
Data is scattered across nodes, making range queries (e.g., “find all values between 100 and 200”) slow.

B-Trees/B+ Trees solve this by:

High Fanout: Each node has many children (e.g., 100+), reducing tree height to O(logₖn) (k = order, number of children per node).
Disk-Friendly Nodes: Nodes are sized to match disk blocks (e.g., 4KB), minimizing I/O (one block read = one node access).
Balanced Structure: Automatic rebalancing ensures consistent performance for insertions/deletions.

I. B-Tree: Definition & Properties

A B-Tree of order m (sometimes called “minimum degree t”—see note below) is a balanced multiway search tree with the following properties:

Key Properties (Order m)

Node Structure: Each node contains:
- n keys (sorted in ascending order: \(k_1 < k_2 < … < k_n\)).
- n+1 child pointers (for internal nodes; leaf nodes have no children).
Size Constraints:
- Root Node: Can have 1 ≤ n ≤ m-1 keys (and 2 ≤ n+1 ≤ m children, except if root is a leaf: n=0).
- Internal/Leaf Nodes: Must have ⌈m/2⌉ – 1 ≤ n ≤ m-1 keys (and ⌈m/2⌉ ≤ n+1 ≤ m children for internal nodes).(⌈m/2⌉ is the minimum number of children per node—ensures balance.)
Search Property: For a node with keys \(k_1…k_n\) and children \(c_0…c_n\):
- All keys in subtree \(c_i\) are < \(k_{i+1}\) (for i < n).
- All keys in subtree \(c_n\) are > \(k_n\).
Balance Property: All leaf nodes are at the same depth (tree height is consistent).
Leaf Nodes: No children; all leaves are at the same level (no “hanging” leaves).

Note: Order vs. Minimum Degree

Order m: Maximum number of children per node (e.g., m=4 → max 3 keys, 4 children).
Minimum degree t: Minimum number of children per internal node (t = ⌈m/2⌉; e.g., m=4 → t=2).Some sources define B-Trees by t instead of m—ensure consistency when referencing.

Example: B-Tree of Order 4 (m=4)

Max keys per node: 3; min keys per node: 1 (⌈4/2⌉ – 1 = 1).
Max children per node: 4; min children per node: 2.

plaintext

                [20, 50]          (Root node: 2 keys, 3 children)
               /    |    \
      [5, 10]   [30, 40]   [60, 70, 80]  (Internal/leaf nodes)
     /  |  \    /  |  \      /  |  \  \
  [1,3] [7] [15] [25] [35] [55] [65] [75] [85]  (Leaf nodes: all at same depth)

II. B+ Tree: Definition & Properties

A B+ Tree of order m is a variant of B-Trees optimized for range queries and sequential access. It retains the B-Tree’s balance and fanout but reorganizes keys and children for better disk performance.

Key Properties (Order m)

Node Structure:
- Internal Nodes: Contain n keys (sorted) and n+1 child pointers (same as B-Tree), but keys are only indexes (not data—data is stored exclusively in leaves).
- Leaf Nodes: Contain n key-data pairs (sorted) and a linked list pointer to the next leaf node (enables sequential access).
Size Constraints: Same as B-Tree (root: 1≤n≤m-1; internal/leaf: ⌈m/2⌉-1≤n≤m-1).
Search Property:
- Internal nodes guide searches (keys are split points for children).
- All data is stored in leaves; leaves are linked in a sorted doubly linked list.
Balance Property: All leaves are at the same depth.
Key Duplication: Keys in internal nodes are duplicates of keys in leaves (internal keys are “copies” to guide searches).

Example: B+ Tree of Order 4 (m=4)

Internal nodes: keys are indexes; leaves: key-data pairs + next pointer.

plaintext

                [20, 50]          (Internal root: keys are indexes)
               /    |    \
      [5, 10]   [30, 40]   [60, 70]  (Internal nodes)
     /  |  \    /  |  \      /  |  \
  [1,3] [5,7,10] [15,20] [30,35,40] [50,55] [60,65,70] [75,80,85]  (Leaves)
   ←───────→ ←───────→ ←───────→ ←───────→ ←───────→ ←───────→ ←───────→
   (Linked list for sequential access)

III. Core Operations (B-Tree vs. B+ Tree)

1. Search

B-Tree Search

Start at the root node.
For the current node, find the first key \(k_i\) > target key.
Follow the child pointer \(c_{i-1}\) (or \(c_n\) if target > all keys).
Repeat until a leaf node is reached:
- If target is in the leaf, return the data.
- Else, return “not found”.

B+ Tree Search

Same as B-Tree for internal nodes (guide search via keys).
Once a leaf node is reached:
- If target is in the leaf, return the data.
- Else, return “not found”.
Range Queries: After finding the start key in a leaf, traverse the linked list of leaves to collect all matching keys (O(range size) time—no backtracking).

2. Insertion

Both B-Trees and B+ Trees insert keys in sorted order and rebalance via splitting nodes when they exceed the maximum key count (m-1):

Find Insertion Point: Traverse to the appropriate leaf node (B-Tree) or leaf node (B+ Tree).
Insert Key: Add the key to the leaf (B-Tree) or leaf’s key-data pair (B+ Tree), keeping keys sorted.
Split Node (if full):
- If the leaf has m keys, split it into two nodes with ⌊m/2⌋ and ⌈m/2⌉ keys.
- Promote the middle key to the parent node (B-Tree) or parent internal node (B+ Tree).
- Repeat splitting up the tree if the parent becomes full (propagates to root if needed—tree height increases by 1).

3. Deletion

Deletion ensures nodes never drop below the minimum key count (⌈m/2⌉ – 1) via borrowing keys from siblings or merging nodes:

Find Target Node: Traverse to the node containing the key (leaf for B+ Tree, any node for B-Tree).
Delete Key: Remove the key from the node (and data if B+ Tree leaf).
Fix Underflow (if < min keys):
- Borrow: If a sibling node has extra keys, move a key from the sibling to the underflow node (update parent’s split key).
- Merge: If no sibling has extra keys, merge the underflow node with a sibling and delete the parent’s split key.
- Repeat up the tree if the parent underflows.

IV. B-Tree vs. B+ Tree: Key Differences

Feature	B-Tree	B+ Tree
Data Storage	Data is stored in both internal nodes and leaves.	Data is stored exclusively in leaves; internal nodes only store keys (indexes).
Key Duplication	No duplicates—each key appears once.	Internal keys are duplicates of leaf keys (internal keys guide searches).
Leaf Nodes	No linked list; leaves are isolated.	Leaves are linked in a sorted doubly linked list (enables sequential access).
Search Performance	Faster for single-key lookups (may find data in internal nodes, avoiding leaf access).	Slightly slower for single-key lookups (must reach a leaf).
Range Queries	Slow (requires backtracking up the tree to find next keys).	Fast (traverse leaf linked list—no backtracking).
Disk I/O Efficiency	Less efficient (data scattered across internal nodes and leaves).	More efficient (data is concentrated in leaves; internal nodes are smaller, fitting more in memory).
Sequential Access	Not supported natively.	Supported via leaf linked list (critical for databases).
Use Cases	Databases with frequent single-key lookups (e.g., MongoDB’s early index design).	Most modern databases (MySQL, PostgreSQL), file systems (NTFS, ext4), and index structures (due to range query efficiency).

V. Time & Space Complexity

Operation	B-Tree Time Complexity	B+ Tree Time Complexity	Explanation
Single-Key Search	O(logₘn)	O(logₘn)	– m = order (number of children per node).- Tree height = O(logₘn) (low due to high fanout).
Range Query	O(logₘn + k logₘn)	O(logₘn + k)	– k = number of keys in the range.- B+ Tree: O(k) to traverse leaf linked list; B-Tree: O(k logₘn) to backtrack.
Insertion	O(logₘn)	O(logₘn)	Splitting nodes takes O(m) time per split, but splits are rare (O(logₘn) total splits).
Deletion	O(logₘn)	O(logₘn)	Borrowing/merging nodes takes O(m) time per operation, but O(logₘn) total operations.
Space Complexity	O(n)	O(n)	Both store n keys; B+ Tree has duplicate internal keys but negligible overhead.

Why O(logₘn) is Critical

For a B+ Tree with order m=100 and n=1,000,000 keys:

Tree height ≈ log₁₀₀(1e6) = 3 (root → internal → leaf).
Only 3 disk I/O operations for a single-key lookup—far better than binary trees (≈20 I/O operations).

VI. Practical Applications

B-Tree Applications

Databases with Frequent Single-Key Lookups: Early versions of MongoDB used B-Trees for primary indexes (prioritized fast single-key access).
File Systems: Some older file systems (e.g., HFS+) use B-Trees for directory indexing.
Embedded Systems: Where memory is limited (B-Trees have no duplicate keys, saving space).

B+ Tree Applications

Modern Databases: MySQL, PostgreSQL, Oracle use B+ Trees for primary and secondary indexes (optimized for range queries and joins).
File Systems: NTFS, ext4, XFS use B+ Trees for file allocation tables (FAT) and directory structures.
Search Engines: For indexing and ranking search results (range queries over document IDs/scores).
Data Warehouses: Where large-scale range queries (e.g., “sales Q3 2024”) are common.

VII. Summary

Feature	B-Tree	B+ Tree
Core Strength	Fast single-key lookups (data in internal nodes).	Fast range queries and sequential access (linked leaves, data concentrated in leaves).
Core Weakness	Slow range queries (no leaf linking).	Slightly slower single-key lookups (must reach leaves).
Data Location	Internal nodes + leaves.	Exclusively leaves.
Key Duplication	No.	Yes (internal keys = leaf key copies).
Best For	Single-key lookups, memory-constrained systems.	Range queries, sequential access, databases/file systems.