How Garbage Collection Optimizes SSD Performance

GC (Garbage Collection)

Definition

GC (Garbage Collection) is an automatic background process in NAND flash-based storage devices (e.g., SSDs, memory cards, USB flash drives) that reclaims space occupied by invalid (deleted, overwritten, or obsolete) data. Unlike hard disk drives (HDDs) that can overwrite data directly, SSDs cannot modify individual bits—they must first erase entire blocks of data before writing new information. GC identifies blocks with a mix of valid and invalid data, moves the valid data to a new block, erases the old block, and marks it as free for future writes. This ensures the SSD maintains consistent performance and maximizes usable storage space.

Core Working Principle

1. SSD Data Structure Basics

SSDs store data in pages (smallest writable units, 4–16 KB) grouped into blocks (smallest erasable units, 128–512 pages). Key constraints:

  • Write: Data can only be written to empty (erased) pages.
  • Erase: Only entire blocks can be erased (not individual pages).
  • Invalid Data: When a file is deleted or overwritten, the OS marks the associated pages as “invalid,” but the SSD retains the physical data until the block is erased.

2. GC Process Steps

  1. Identify Target Blocks: The SSD controller scans for blocks containing both valid and invalid pages (called “partially filled blocks”). Blocks with a high ratio of invalid data are prioritized.
  2. Move Valid Data: The controller copies all valid pages from the target block to a new, pre-erased block (or a temporary buffer).
  3. Erase Old Block: The original block (now empty of valid data) is erased, converting it to a “free block” available for new writes.
  4. Update Mapping Table: The SSD’s logical-to-physical address mapping table is updated to point to the new location of the valid data.

3. Trigger Conditions for GC

GC runs automatically under specific scenarios:

  • Idle Time: The controller runs GC when the SSD is not actively reading/writing (e.g., when a computer is idle).
  • Low Free Space: If the number of free blocks drops below a threshold (e.g., 10–15% of total capacity), GC is triggered to reclaim space.
  • Write Operations: When new data is written and no free blocks are available, GC runs “on-demand” to free up space (this can cause temporary slowdowns).

Key Benefits of GC

1. Sustained SSD Performance

Without GC, SSDs would quickly run out of free blocks, forcing on-the-fly erasures during write operations (a process called “write amplification”). GC ensures a steady supply of pre-erased free blocks, minimizing write latency and maintaining consistent read/write speeds over time.

2. Maximized Usable Storage

GC reclaims space from invalid data, allowing the SSD to use its full capacity efficiently. For example, if a user deletes a 50GB file, GC ensures the 50GB of space is eventually made available for new data.

3. Extended SSD Lifespan

By reducing “write amplification” (the ratio of actual data written to the flash vs. user data written), GC minimizes the number of write/erase (P/E) cycles each block undergoes. This slows flash wear and extends the SSD’s total endurance (measured in Terabytes Written, TBW).

GC vs. TRIM: Complementary Processes

GC and TRIM work together to optimize SSD performance, but they serve distinct roles:

FeatureGC (Garbage Collection)TRIM
InitiatorSSD controller (hardware/firmware)Operating system (OS)
TriggerLow free space, idle time, or write operationsFile deletion/overwriting (user action)
FunctionReclaims space by moving valid data and erasing old blocksNotifies SSD of invalid data locations (no data movement)
TimingReactive (runs when needed) or proactive (idle time)Proactive (marks invalid data immediately)
DependencyWorks independently, but enhanced by TRIMRequires SSD support for GC to act on marked data

How They Collaborate:

  • When a user deletes a file, the OS sends a TRIM command to the SSD, marking the corresponding pages as invalid.
  • During idle time, GC uses the TRIM-marked invalid data to prioritize blocks for erasure, avoiding unnecessary movement of valid data and making GC more efficient.

Types of GC

1. Idle GC

Runs when the SSD is not in use (e.g., a laptop in sleep mode, a camera turned off). This is the most efficient form of GC, as it does not interfere with active user operations and minimizes write amplification.

2. Foreground GC (On-Demand GC)

Triggers when the SSD has no free blocks available for new writes. This is necessary but can cause temporary slowdowns (increased write latency) because GC competes with user-initiated write operations for the SSD’s resources.

3. Background GC

Runs continuously in the background while the SSD is active (e.g., during file transfers or app usage). It balances performance and space reclamation, but may slightly impact SSD throughput during heavy workloads.

Factors Affecting GC Efficiency

1. Free Space

GC works best when the SSD has ≥10–20% free space. A nearly full SSD (≥90% used) forces frequent foreground GC, leading to slower write speeds and higher write amplification.

2. TRIM Support

SSDs with TRIM support (paired with a TRIM-enabled OS) allow GC to target blocks with 100% invalid data (marked by TRIM), eliminating the need to move valid data and making GC far more efficient.

3. Over-Provisioning (OP)

Most SSDs reserve a small portion of their capacity (5–10%) as over-provisioning—a dedicated pool of free blocks for GC and write operations. Larger over-provisioning improves GC efficiency and reduces write amplification.

4. Workload Type

  • Read-Heavy Workloads (e.g., web browsing, document editing): GC has ample idle time to run, maintaining optimal performance.
  • Write-Heavy Workloads (e.g., video editing, database servers): Frequent writes leave little idle time for GC, increasing reliance on foreground GC and potentially slowing performance.

Limitations of GC

1. Performance Impact During Foreground GC

When GC runs on-demand (foreground GC), it uses the SSD’s controller and flash resources, leading to temporary write slowdowns (e.g., longer file transfer times or app loading delays).

2. Write Amplification

GC itself contributes to write amplification (e.g., moving valid data from an old block to a new one counts as additional writes). While TRIM and over-provisioning reduce this, it cannot be eliminated entirely.

3. No Data Recovery

Once GC erases a block with invalid data, the data is permanently unrecoverable (even with data recovery software). This is a security benefit but a risk if files are deleted accidentally.

4. Inefficiency on Older/Cheaper SSDs

Low-cost SSDs or older models may use less sophisticated GC algorithms, leading to poor performance under heavy workloads or when the drive is nearly full.

Practical Tips to Optimize GC Performance

Avoid Constant Write-Heavy Workloads: If possible, offload large, continuous write tasks (e.g., video rendering) to an HDD to reduce strain on the SSD’s GC process.

Leave Free Space: Keep ≥10–20% of the SSD unallocated to ensure GC has enough free blocks to work with and reduce foreground GC triggers.

Enable TRIM: Ensure TRIM is enabled in your OS (default on modern systems) to help GC target invalid data efficiently.

Avoid Frequent Full Drives: Regularly delete unnecessary files to prevent the SSD from becoming nearly full (≥90% used).

Use Over-Provisioning: Some SSDs allow manual expansion of over-provisioning (e.g., via manufacturer tools), further improving GC efficiency.



了解 Ruigu Electronic 的更多信息

订阅后即可通过电子邮件收到最新文章。

Posted in

Leave a comment