QPI vs FSB: Key Differences in Processor Connectivity

QPI (QuickPath Interconnect) is a high-speed, point-to-point serial interconnect technology developed by Intel to replace the older Front Side Bus (FSB) architecture. Introduced in 2008 (with the Nehalem microarchitecture), QPI enables fast communication between CPUs, memory controllers, and other system components (e.g., chipsets, I/O hubs) in multi-processor (MP) systems, workstations, and servers. It operates at high bandwidths and low latency, supporting both cache coherence and non-coherent data transfers.

Core Architecture & Key Features

1. Point-to-Point Topology

Unlike the shared-bus FSB (where all components compete for a single bus), QPI uses a direct point-to-point link between two components (e.g., CPU-to-CPU, CPU-to-chipset). This eliminates bus contention and allows parallel data transfers across multiple links, significantly improving system scalability.

2. Serial Transmission

QPI transmits data serially over differential signal pairs (lanes), as opposed to the parallel transmission of FSB. Each QPI link consists of:

Lanes: A set of differential pairs (typically 4, 8, or 16 lanes per direction). Each lane carries one bit of data per clock cycle.
Directionality: Separate transmit (Tx) and receive (Rx) lanes for full-duplex communication (simultaneous send/receive).
Clock Signaling: Uses embedded clocking (clock data recovery, CDR) instead of a dedicated clock signal, reducing pin count and improving signal integrity at high speeds.

3. Bandwidth Calculation

QPI bandwidth is determined by three factors:

Link Speed: Measured in gigatransfers per second (GT/s; 1 GT/s = 1 billion transfers per second). Early QPI versions supported 4.8 GT/s, with later generations reaching 9.6 GT/s and 10.4 GT/s.
Lane Count: Number of lanes per direction (e.g., 4, 8, or 16 lanes).
Encoding: Uses 8b/10b encoding (8 bits of data encoded into 10 bits for error detection), which introduces a 20% overhead.

Bandwidth Formula:

Effective Bandwidth (GB/s) = (Link Speed × Lane Count × 2) / 10

The ×2 accounts for full-duplex (Tx + Rx).
The ÷10 accounts for 8b/10b encoding.

Example: A QPI link with 8 lanes per direction and 6.4 GT/s speed:

(6.4 GT/s × 8 × 2) / 10 = 10.24 GB/s (full-duplex bandwidth).

4. Cache Coherence

QPI maintains cache coherence in multi-CPU systems (e.g., 2-socket or 4-socket servers), ensuring that all CPU cores see a consistent view of shared memory. It uses the MESIF protocol (Modified, Exclusive, Shared, Invalid, Forward) to track cache line states and coordinate data transfers between CPUs, reducing latency for shared data access.

QPI vs. FSB (Front Side Bus)

Aspect	QPI	FSB
Topology	Point-to-point (direct links)	Shared bus (single path for all components)
Scalability	Supports multi-socket systems (2–8 CPUs)	Limited to single or dual-socket systems
Bandwidth	Scalable (up to 25.6 GB/s for 16-lane, 10.4 GT/s links)	Fixed (max ~16 GB/s for 1600 MHz FSB)
Latency	Low (direct links reduce hop count)	High (bus contention increases latency)
Cache Coherence	Native support (MESIF protocol)	Requires additional logic (e.g., cache agents)
Pin Count	Lower (serial lanes)	Higher (parallel data/address pins)

QPI Use Cases

1. Multi-Socket Servers

QPI is the primary interconnect for Intel’s Xeon server processors (e.g., Xeon 5500/5600 series, Xeon E7), enabling high-speed communication between CPUs in 2-socket, 4-socket, or 8-socket server configurations. For example:

A 2-socket server uses one QPI link between the two CPUs to share memory and cache data.
A 4-socket server uses a mesh of QPI links (each CPU connected to two others) for full connectivity.

2. Workstations & High-End Desktops

Intel’s Core i7 Extreme Edition processors (e.g., Core i7-980X) used QPI to connect the CPU to the integrated memory controller and I/O hub, delivering faster memory and I/O performance than FSB-based systems.

3. Embedded & Specialized Systems

QPI is used in high-performance embedded systems (e.g., industrial controllers, military hardware) that require low-latency interconnects between processors and peripherals.

Evolution & Successors

1. QPI Generations

1st Gen (2008): 4.8 GT/s, 8b/10b encoding, used in Nehalem Xeon/Core i7.
2nd Gen (2010): 6.4 GT/s, improved power efficiency, used in Westmere Xeon.
3rd Gen (2012): 8.0 GT/s, 10.4 GT/s, used in Sandy Bridge-E/EP Xeon.
4th Gen (2014): 9.6 GT/s, used in Ivy Bridge-E/EP Xeon.

2. Successor: UPI (Ultra Path Interconnect)

Intel replaced QPI with UPI (Ultra Path Interconnect) in 2017 (with the Skylake-SP Xeon microarchitecture). UPI offers:

Higher bandwidth (up to 38.4 GB/s per link with 16 GT/s speed and 128b/130b encoding).
Improved cache coherence (MESIF protocol enhancements).
Support for up to 8-socket systems with mesh topology.
Lower power consumption per GB/s of bandwidth.

3. Comparison: QPI vs. UPI

Aspect	QPI	UPI
Max Link Speed	10.4 GT/s	16.0 GT/s (1st gen), 22.4 GT/s (2nd gen)
Encoding	8b/10b (20% overhead)	128b/130b (<2% overhead)
Max Bandwidth	25.6 GB/s (16 lanes, 10.4 GT/s)	112 GB/s (16 lanes, 22.4 GT/s)
Topology	Point-to-point/mesh	Enhanced mesh (support for 8+ sockets)
Power Efficiency	Moderate	Higher (lower power per GT/s)

Limitations of QPI

Power Consumption: Early QPI implementations consumed more power than FSB at equivalent bandwidth, though later generations improved efficiency.
Scalability Limits: QPI’s point-to-point topology becomes complex in systems with more than 8 sockets (UPI addressed this with enhanced mesh).
Encoding Overhead: 8b/10b encoding wastes 20% of bandwidth; UPI’s 128b/130b encoding nearly eliminates this.

Real-World Examples

4-Socket Xeon E7-4870 Server: Uses a QPI mesh (each CPU connected to two others) with 6.4 GT/s links, enabling shared memory access across all four CPUs.

Intel Xeon E5-2690 (2012): 2-socket server CPU with two QPI links (8 lanes each, 8.0 GT/s), delivering 20.48 GB/s total bandwidth.

Intel Core i7-990X (2011): High-end desktop CPU with a single QPI link (16 lanes, 6.4 GT/s), providing 20.48 GB/s bandwidth to the memory controller and I/O hub.