Comparing InfiniBand and Ethernet: A Performance Analysis

InfiniBand (IB) is a high-speed, low-latency serial interconnect standard designed for high-performance computing (HPC), data centers, and enterprise storage systems. Developed by the InfiniBand Trade Association (IBTA) in the early 2000s, InfiniBand was created to address the limitations of traditional interconnects like Ethernet and PCIe for ultra-fast, parallel data transfer between servers, storage devices, and GPUs. It is the de facto standard for supercomputers, AI/ML training clusters, and high-throughput storage area networks (SANs), offering unmatched bandwidth and latency performance for distributed computing workloads.

InfiniBand operates as a switched fabric architecture, meaning all devices (nodes) connect to a central switch fabric rather than a shared bus, enabling non-blocking, concurrent data transfer between thousands of nodes.


Core Technical Specifications and Generations

InfiniBand has evolved through multiple generations, with each iteration doubling or quadrupling bandwidth while reducing latency. The table below outlines the key parameters of major InfiniBand generations:

InfiniBand GenerationReleaseLine Rate (per lane)Lane ConfigurationsTotal Bandwidth (Full-Duplex)Latency (Round-Trip)Max Distance (Copper/Fiber)Key Use Cases
SDR (Single Data Rate)20012.5 Gbps4X, 12X10 Gbps (4X), 30 Gbps (12X)~200 ns10m (copper), 10km (single-mode fiber)Early HPC clusters, entry-level SANs
DDR (Double Data Rate)20045.0 Gbps4X, 12X20 Gbps (4X), 60 Gbps (12X)~150 ns10m (copper), 20km (single-mode fiber)Mid-range HPC, database clusters
QDR (Quad Data Rate)200710.0 Gbps4X, 12X40 Gbps (4X), 120 Gbps (12X)~100 ns10m (copper), 40km (single-mode fiber)High-performance HPC, virtualized data centers
FDR (Fourteen Data Rate)201114.0 Gbps4X, 12X56 Gbps (4X), 168 Gbps (12X)~70 ns10m (copper), 100km (single-mode fiber)AI training clusters, supercomputers
EDR (Enhanced Data Rate)201425.0 Gbps4X, 12X100 Gbps (4X), 300 Gbps (12X)~50 ns10m (copper), 100km (single-mode fiber)Exascale HPC, GPU clusters (NVIDIA Tesla)
HDR (High Data Rate)201750.0 Gbps4X, 12X200 Gbps (4X), 600 Gbps (12X)~30 ns10m (copper), 200km (single-mode fiber)AI/ML mega-clusters, cloud hyperscale data centers
NDR (Next Data Rate)2021100.0 Gbps4X, 12X400 Gbps (4X), 1.2 Tbps (12X)~20 ns10m (copper), 400km (single-mode fiber)800G AI clusters, exascale supercomputers
XDR (Extreme Data Rate)2024200.0 Gbps4X, 12X800 Gbps (4X), 2.4 Tbps (12X)~10 ns10m (copper), 800km (single-mode fiber)Quantum computing, Zettascale HPC, advanced AI training

Notes:

  • 4X/12X: Refers to the number of lanes (4 or 12) in the InfiniBand cable/connector; 12X is typically used for switch-to-switch backbones.
  • Latency: Measured as round-trip latency for small (64-byte) packets, a critical metric for HPC and AI workloads requiring real-time data exchange.

Key Architectural Features

1. Switched Fabric Topology

InfiniBand uses a non-blocking switched fabric (as opposed to Ethernet’s shared bus or point-to-point topology) where each node connects to an InfiniBand switch. This enables:

  • Concurrent Communication: Thousands of nodes can transfer data simultaneously without bandwidth contention.
  • Scalability: Clusters can scale to tens of thousands of nodes by adding switches to the fabric (e.g., fat-tree, mesh, or torus topologies).
  • Redundancy: Switched fabric supports multiple paths between nodes, ensuring fault tolerance if a switch or cable fails.

2. Remote Direct Memory Access (RDMA)

A foundational feature of InfiniBand, RDMA allows one node to directly access the memory of another node without involving the CPU or operating system of either device. This eliminates:

  • CPU Overhead: RDMA bypasses kernel processing, reducing CPU utilization by up to 90% compared to TCP/IP Ethernet.
  • Latency: RDMA enables sub-10 ns one-way latency for small packets, critical for HPC and AI workloads (e.g., GPU-to-GPU communication).
  • Data Copying: Eliminates redundant data copies between user and kernel space, further boosting throughput.

3. Quality of Service (QoS)

InfiniBand provides granular QoS controls to prioritize traffic types (e.g., HPC compute traffic vs. storage traffic):

  • Virtual Lanes (VLs): Up to 16 virtual lanes per physical link, allowing separate prioritization for critical and non-critical traffic.
  • Traffic Classification: Packets are tagged with service levels (SLs) to ensure low-latency traffic (e.g., AI model parameters) is transmitted before bulk data (e.g., storage backups).
  • Congestion Control: Advanced congestion management algorithms (e.g., Adaptive Routing) prevent packet loss and reduce latency in congested clusters.

4. Unified Communication Protocol

InfiniBand supports multiple transport protocols over a single fabric, making it a versatile interconnect for mixed workloads:

  • InfiniBand Verbs: The low-level API for direct hardware access, used for HPC and RDMA applications.
  • IP over InfiniBand (IPoIB): Enables standard IP networking (TCP/IP, UDP) over InfiniBand, supporting legacy applications.
  • SRP (SCSI RDMA Protocol): For storage access, allowing RDMA-based connectivity to SANs (e.g., Fibre Channel over InfiniBand, FCoIB).
  • MPI (Message Passing Interface): The de facto standard for HPC, optimized for InfiniBand’s low latency and high bandwidth.

InfiniBand vs. Ethernet (100G/400G/800G)

While Ethernet has evolved to support high speeds (e.g., 400GBASE-T, 800GBASE-R), InfiniBand remains superior for ultra-low-latency, high-scalability workloads. The table below highlights the key differences:

CharacteristicInfiniBand (NDR/XDR)Ethernet (400G/800G)
Latency~10–20 ns (round-trip, 64B)~50–100 ns (round-trip, 64B)
RDMA SupportNative (built into hardware)Requires RoCE (RDMA over Converged Ethernet) – software/hardware dependent
ScalabilityTens of thousands of nodes (fat-tree fabric)Limited by switch ASICs (typically <10,000 nodes)
QoSHardware-enforced virtual lanes (16 VLs)Software-defined QoS (DSCP marking) – less granular
Congestion ControlAdaptive routing, hardware-based congestion managementTCP/IP congestion control (software-based) – higher latency
CostHigher (specialized switches/HCAs)Lower (commodity Ethernet hardware)
Legacy CompatibilityLimited (requires InfiniBand HCAs)Universal (supports all IP-based applications)

NoteRoCE v2 (RDMA over Converged Ethernet) narrows the gap between Ethernet and InfiniBand for RDMA, but InfiniBand still offers lower latency and better scalability for large clusters.


Typical Applications

InfiniBand is the interconnect of choice for the most demanding computing workloads:

  1. Supercomputers: All of the world’s top 10 supercomputers (as of 2025) use InfiniBand for node-to-node communication (e.g., Frontier, Aurora, Fugaku).
  2. AI/ML Training Clusters: NVIDIA DGX A100/H100 clusters rely on InfiniBand (NDR/XDR) for GPU-to-GPU and server-to-server communication, enabling fast training of large language models (LLMs) and computer vision models.
  3. High-Performance Storage: InfiniBand powers SANs and parallel file systems (e.g., Lustre, GPFS) for low-latency access to petabyte-scale data.
  4. Cloud Hyperscale Data Centers: AWS, Google Cloud, and Microsoft Azure use InfiniBand for their high-performance computing (HPC) and AI cloud services (e.g., AWS EC2 P5 instances).
  5. Financial Services: Low-latency InfiniBand networks are used for high-frequency trading (HFT), where microsecond-level latency can determine trading success.

Limitations and Adoption Barriers

Despite its performance advantages, InfiniBand has limited adoption in mainstream enterprise networks due to:

Cable Distance: Copper InfiniBand cables are limited to 10 meters (though fiber supports long distances), restricting its use in large campus networks.

Cost: InfiniBand host channel adapters (HCAs), switches, and cables are significantly more expensive than commodity Ethernet hardware.

Specialized Expertise: Deploying and managing InfiniBand fabrics requires specialized knowledge, unlike Ethernet (which is widely understood by network engineers).

Legacy Compatibility: InfiniBand does not natively support standard IP applications without IPoIB, making it less flexible than Ethernet for mixed workloads.



了解 Ruigu Electronic 的更多信息

订阅后即可通过电子邮件收到最新文章。

Posted in

Leave a comment