Ray Tracing Cores vs. Regular GPU Cores: Key Differences Explained

How does a Ray Tracing Core differ from a regular GPU core?

Ray Tracing Core (RTC) is a dedicated hardware unit optimized for ray tracing, whereas a regular GPU core (CUDA core/shader core) is a general-purpose parallel processor. Their key differences lie in functiondata layoutexecution model, and performance profile—making RTCs orders of magnitude faster for ray–triangle intersection and BVH traversal, while regular cores excel at general compute and rasterization.


Core Differences at a Glance

AspectRay Tracing CoreRegular GPU Core (CUDA/Shader)
Primary RoleRay tracing acceleration (BVH traversal + ray–triangle intersection)General parallel compute, rasterization, shading, GPGPU
Data LayoutAoS (array of structures) for ray bundlesSoA (structure of arrays) for SIMD/SIMT warp execution
Execution ModelFixed-function pipeline for per-ray traversal + intersection; asynchronous with shadersProgrammable ALU; warp-synchronous, instruction-at-a-time
Key OperationsRay–AABB, ray–triangle intersection; BVH tree walkArithmetic, texture, raster ops, branching, control flow
Throughput MetricGigaRays/second (GR/s)FLOPS/clock or instructions/clock
Workload FitRay tracing (primary/secondary rays, shadows, reflections)Rasterization, AI, HPC, physics, video encode/decode
ProgrammabilityHardware-fixed; controlled via APIs (OptiX, DXR, Vulkan RT)Fully programmable via HLSL/GLSL/CUDA/ROCm
Memory AccessOptimized for BVH node and triangle data; spatial localityOptimized for texture, vertex, and buffer data; random access
Power EfficiencyHigher per-ray efficiency; lower for non-ray workloadsBalanced across diverse workloads; less efficient for ray tracing

Why the Difference Matters

  • Function Specialization: RTCs offload the ray tracing bottleneck—BVH traversal and intersection—from shader cores, which are inefficient at this due to branch divergence and data layout mismatch (SoA vs. AoS). A single RTC can process a ray in ~10 cycles, while a shader core may take 100+ cycles for the same work.
  • Execution Model: RTCs run asynchronously with shaders, allowing the GPU to overlap traversal/intersection with shading, reducing pipeline bubbles. Shader cores are warp-synchronous, which breaks down when rays in a warp take different paths (divergence).
  • Data Flow: Ray tracing works best with AoS (each ray’s origin, direction, t-min/max in one struct), but shader cores are optimized for SoA (separate arrays for origins, directions, etc.), leading to poor memory efficiency and ALU utilization for ray workloads.
  • Hardware Efficiency: RTCs integrate fixed-function logic for intersection (e.g., Möller-Trumbore) and BVH traversal, eliminating instruction overhead and enabling higher throughput per transistor. Shader cores are programmable but pay a transistor and latency cost for flexibility.

How They Work Together in a Frame

  1. Ray Generation: Shader cores emit primary rays (camera) and secondary rays (reflections, shadows).
  2. BVH Traversal + Intersection: RTCs take rays, traverse the BVH, and find the closest intersection—all in hardware, asynchronously.
  3. Shading: Shader cores compute color, lighting, and materials using intersection results from RTCs.
  4. Denoising: Tensor cores or AI accelerators clean up noisy ray-traced outputs (e.g., DLSS, FSR).

This hybrid pipeline combines RTCs for ray tracing, shader cores for general compute, and AI for quality/performance—enabling real-time photorealism in games and content creation.


Performance Impact

  • Ray Tracing: RTCs deliver 10–100× speedups over shader-only ray tracing, turning offline rendering into real-time experiences. For example, an RTX 4090’s 3rd-gen RTCs hit ~191 GR/s, vs. ~1 GR/s on a shader-only RTX 1080Ti.
  • Non-Ray Workloads: RTCs contribute nothing to rasterization, AI, or HPC—they are idle unless ray tracing is active.

Limitations of RTCs

  • Workload Lock: Only accelerates ray tracing; no benefit for non-ray tasks.
  • API Dependence: Requires DXR, Vulkan RT, or OptiX to leverage hardware acceleration.
  • Scene Complexity: Struggles with highly dynamic or volumetric scenes (e.g., dense smoke) where BVH updates are frequent or traversal is costly.

In short, RTCs are ray tracing specialists, while regular GPU cores are generalists. For real-time ray tracing, RTCs are indispensable—they turn a computationally prohibitive task into a feasible one, freeing shader cores to focus on what they do best: shading and general compute.


了解 Ruigu Electronic 的更多信息

订阅后即可通过电子邮件收到最新文章。

Posted in

Leave a comment