AMD RDNA 3 GPU Architecture Deep Dive

AMD RDNA 3 GPU Structure Deep Dive: The Ryzen Second for GPUs

Posted on

On November 3, AMD revealed key particulars of its upcoming RDNA 3 GPU structure and the Radeon RX 7900-series graphics playing cards. It was a public announcement that the entire world was invited to observe. Shortly after the announcement, AMD took press and analysts behind closed doorways to dig a bit of deeper into what makes RDNA 3 tick — or is it tock? Regardless of.

We’re allowed to speak concerning the extra RDNA 3 particulars and different briefings AMD supplied now, which nearly definitely has nothing to do with Nvidia’s impending launch of the RTX 4080 on Wednesday. (That is sarcasm, simply in case it wasn’t clear. This form of factor occurs on a regular basis with AMD and Nvidia, or AMD and Intel, and even Intel and Nvidia now that Workforce Blue has joined the GPU race.)

AMD’s RDNA 3 structure essentially adjustments a number of of the important thing design parts for GPUs, because of using chiplets. And that is pretty much as good of a spot to start out as any. We have additionally obtained separate articles protecting AMD’s Gaming and ISV Relations, Software program and Platform particulars, and the Radeon RX 7900 Collection Graphics Playing cards.

RDNA 3 and GPU Chiplets

Navi 31 consists of two core items, the Graphics Compute Die (GCD) and the Reminiscence Cache Dies (MCDs). There are similarities to what AMD has completed with its Zen 2/3/4 CPUs, however all the things has been tailored to suit the wants of the graphics world.

For Zen 2 and later CPUs, AMD makes use of an Enter/Output Die (IOD) that connects to system reminiscence and gives all the vital performance for issues just like the PCIe Categorical interface, USB ports, and extra lately (Zen 4) graphics and video performance. The IOD then connects to a number of Core Compute Dies (CCDs — alternatively “Core Advanced Dies,” relying on the day of the week) by way of AMD’s Infinity Cloth, and the CCDs include the CPU cores, cache, and different parts.

A key level within the design is that typical basic computing algorithms — the stuff that runs on the CPU cores — will largely match inside the numerous L1/L2/L3 caches. Fashionable CPUs up by way of Zen 4 solely have two 64-bit reminiscence channels for system RAM (although EPYC Genoa server processors can have as much as twelve DDR5 channels).

The CCDs are small, and the IOD can vary from round 125mm^2 (Ryzen 3000) to as massive as 416mm^2 (EPYC xxx2 era). Most lately, the Zen 4 Ryzen 7000-series CPUs have an IOD made utilizing TSMC N6 that measures simply 122mm^2 with one or two 70mm^2 CCDs manufactured on TSMC N5, whereas the EPYC xxx4 era makes use of the identical CCDs however with a comparatively large IOD measuring 396mm^2 (nonetheless made on TSMC N6).

(Picture credit score: AMD)

GPUs have very totally different necessities. Giant caches may also help, however GPUs additionally actually like having gobs of reminiscence bandwidth to feed all of the GPU cores. For instance, even the beastly EPYC 9654 with a 12-channel DDR5 configuration ‘solely’ delivers as much as 460.8 GB/s of bandwidth. The quickest graphics playing cards just like the RTX 4090 can simply double that.

In different phrases, AMD wanted to do one thing totally different for GPU chiplets to work successfully. The answer finally ends up being virtually the reverse of the CPU chiplets, with reminiscence controllers and cache being positioned on a number of smaller dies whereas the principle compute performance resides within the central GCD chiplet.

The GCD homes all of the Compute Models (CUs) together with different core performance like video codec {hardware}, show interfaces, and the PCIe connection. The Navi 31 GCD has as much as 96 CUs, which is the place the everyday graphics processing happens. Nevertheless it additionally has an Infinity Cloth alongside the highest and backside edges (linked by way of some form of bus to the remainder of the chip) that then connects to the MCDs.

The MCDs, because the title implies (Reminiscence Cache Dies) primarily include the massive L3 cache blocks (Infinity Cache), plus the bodily GDDR6 reminiscence interface. In addition they must include Infinity Cloth hyperlinks to hook up with the GCD, which you’ll be able to see within the die shot alongside the middle dealing with fringe of the MCDs.

GCD will use TSMC’s N5 node, and can pack 45.7 billion transistors right into a 300mm^2 die. The MCDs in the meantime are constructed on TSMC’s N6 node, every packing 2.05 billion transistors on a chip that is solely 37mm^2 in measurement. Cache and exterior interfaces are a number of the parts of contemporary processors that scale the worst, and we will see that total the GCD averages 152.3 million transistors per mm^2, whereas the MCDs solely common 55.4 million transistors per mm^2.

AMD’s Excessive Efficiency Fanout Interconnect

Swipe to scroll horizontally
Interconnect Picojoules per Bit (pJ/b)
On-die 0.1
Foveros 0.2
EMIB 0.3
UCIe 0.25-0.5
Infinity Cloth (Navi 31) 0.4
TSMC CoWoS 0.56
Bunch of Wires (BoW) 0.5-0.7
Infinity Cloth (Zen 4) ???
NVLink-C2C 1.3
Infinity Cloth (Zen 3) 1.5 (?)

Supply hyperlink

Leave a Reply

Your email address will not be published. Required fields are marked *