AMD’s Ryzen 9 7950X3D is the quickest gaming CPU on the earth because of AMD’s resolution to convey its disruptive 3D chip-stacking tech to Zen 4, however curiously, the corporate didn’t share any particulars about its new Second-Gen 3D V-Cache in its Ryzen 7000X3D briefing supplies. We initially discovered some particulars at a current tech convention that we included in our evaluation, and now AMD has lastly answered a couple of of our follow-up questions and shared essential new particulars, together with that the chiplet stays on the 7nm course of and now has a peak bandwidth of as much as 2.5 TB/s, whereas the first-gen 3D V-Cache peaked at 2 TB/s (amongst a number of different new information). We even have new pics and diagrams of the brand new 6nm I/O Die that AMD makes use of for its Ryzen 7000 processors.
Intel does not have a solution for this tech but, assuring AMD a win in each gaming efficiency and sure knowledge heart purposes. Total, AMD’s second-gen 3D V-Cache know-how is a formidable step ahead over the first-gen as a result of it permits the corporate to leverage the now-mature and less-expensive 7nm course of node to spice up the efficiency of its cutting-edge 5nm compute die. The brand new design represents AMD taking the important thing benefit of chiplet-based design methodologies — utilizing an older and less-expensive course of node in tandem with costly new course of tech — into the third dimension. Now for the nitty-gritty particulars.
First, a fast high-level refresher. As you’ll be able to see above, AMD’s 3D V-Cache tech stacks a further L3 SRAM chiplet instantly within the heart of the compute die (CCD) chiplet to isolate it from the heat-generating cores. This cache boosts capability to 96MB for the 3D V-Cache-equipped chiplet, thus boosting efficiency for latency-sensitive apps, like gaming.
We acquired new data on the second-gen implementation each direct from AMD and from the 2023 Worldwide Stable-State Circuits Convention (ISSCC), the place AMD made a presentation on the Zen 4 structure.
AMD’s previous-gen 3D V-Cache used a 7nm L3 SRAM chiplet stacked atop a 7nm Zen 3 CCD. AMD caught with the 7nm course of for the brand new L3 SRAM chiplet however now stacks it on prime of a smaller 5nm Zen 4 CCD (see the desk beneath). This creates a measurement mismatch, although, which required a couple of alterations.
Row 0 – Cell 0 | 2nd-Gen 7nm 3D V-Cache Die | First-Gen 7nm 3D V-Cache Die | 5nm Zen 4 Core Complicated Die (CCD) | 7nm Zen 3 Core Complicated Die (CCD) |
Dimension | 36mm^2 | 41mm^2 | 66.3 mm^2 | 80.7mm^2 |
Transistor Rely | ~4.7 Billion | 4.7 Billion | 6.57 Billion | 4.15 Billion |
MTr/mm^2 (Transistor Density) | ~130.6 Million | ~114.6 Million | ~99 Million | ~51.4 Million |
First, AMD made the 7nm SRAM die smaller, so it now measures 36mm2 in comparison with the previous-gen’s 41mm2. Nonetheless, the full variety of transistors stays the identical at ~4.7 billion, so the brand new die is considerably denser than the first-gen chiplet.
As we noticed with the first-gen SRAM chiplet, that is an unbelievable transistor density for the 7nm chiplet — we’re taking a look at nearly 3x the density of the first-gen 7nm compute chiplet, and surprisingly, the 7nm SRAM chiplet is considerably denser than the 5nm compute chiplet. That is as a result of, as earlier than, the chiplet makes use of a density-optimized model of 7nm that is specialised for SRAM. It additionally lacks the standard management circuitry discovered within the cache — that circuitry resides on the bottom die, which additionally helps cut back latency overhead. In distinction, the 5nm die contains a number of sorts of transistors together with knowledge paths and different sorts of constructions not current within the simplified L3 SRAM chiplet.
As earlier than, the additional latency from the extra L3 SRAM cache weighs in at 4 clocks, however the bandwidth between the L3 chiplet and the bottom die has elevated to 2.5 TB/s, a 25% enchancment over the earlier 2 TB/s peak.
The stacked L3 SRAM chiplet is linked to the bottom die with two sorts of through-silicon vias (TSVs — a vertical electrical connection). The Energy TSVs carry energy between the chiplets, whereas the Sign TSVs carry knowledge between the items.
Within the first-gen design, each sorts of TSVs resided within the L3 area of the bottom chiplet. Nonetheless, the L3 cache on the bottom die is now smaller because of the elevated density of the 5nm course of, and though the 7nm L3 SRAM chiplet is smaller, it now overlaps the L2 cache (the prior gen solely overlapped the L3 on the bottom die). As such, AMD needed to alter the TSV connections in each the bottom die and the L3 SRAM chiplet.
Shifting these energy TSVs from L3 to the L2 area was essential because of the elevated density of the 5nm L3 cache on the bottom die —For the bottom die, AMD achieved a 0.68x efficient space scaling throughout the L3 cache, knowledge paths, and management logic in comparison with the previous 7nm base chiplet, so there may be bodily much less room for TSVs within the L3 cache.
The sign TSVs stay contained in the L3 cache space on the bottom die, however shifting the ability TSVs to the L2 helped shrink the TSV space within the L3 cache by 50%. It’s unclear how a lot of the L3 TSV density approvement got here from eradicating the ability TSVs, although — routing energy and sign TSVs collectively can create sign integrity points, which are sometimes combatted by rising the spacing between TSVs. Separating the 2 sorts of TSVs to separate areas might permit AMD to pack the sign TSVs nearer collectively, thus offering a further profit.
AMD’s 3D chip stacking tech relies on TSMC’s SoIC know-how. TSMC’s SoIC is bump-less, which means it does not use microbumps or solder to attach the 2 dies. We’ve lined the deep-dive particulars of this know-how in our RYzen 7 5800X3D evaluation, and you may learn way more in regards to the hybrid bonding and manufacturing course of right here.
AMD tells us it used the identical bonding course of for the brand new chiplet, albeit with course of and DCTO enhancements, and the minimal TSV pitch hasn’t modified. AMD additionally utilized learnings from the first-gen design to assist cut back management circuitry overhead within the new design.
Tom’s {Hardware} Measurements | Single-Threaded Peak | Multi-Threaded Sustained | Voltage (peak) | nT Energy |
CCD 0 (3D V-Cache) | 5.25 GHz | 4.85 GHz | 1.152 | 86W |
CCD 1 (No additional cache) | 5.75 GHz | 5.3 GHz | 1.384 | 140W |
The L3 SRAM chiplet additionally stays on the identical energy area because the CPU cores, to allow them to’t be adjusted independently. This contributes to the decrease frequency on the cache-equipped chiplet as a result of the voltage cannot exceed ~1.15V. You may see our in-depth testing of the 2 several types of chiplets right here.
Row 0 – Cell 0 | 6nm I/O Die (IOD) – Ryzen 7000 | 12nm I/O Die (IOD) – Ryzen 5000 | 6nm I/O Die (IOD) – EPYC |
Dimension | 117.8mm^2 | 125mm^2 | 386.88mm^2 |
Transistor Rely | 3.37 Billion | 2.09 Billion | 11 billion |
MTr/mm^2 (Transistor Density) | ~28.6 Million | ~16.7 Million | ~29.8 Million |
AMD’s ISSCC presentation additionally included loads of new particulars in regards to the 6nm I/O Dies (IOD) used within the Ryzen 7000 and EPYC Genoa processors. Within the above album, you’ll be able to see the zoomed-in photographs and an annotated die shot from chip detective @Locuza_. It’s also possible to develop the tweet beneath to learn Locuza’s wonderful evaluation of the Ryzen 7000 IOD.
We put the specs within the desk for simple comparability, and as you’ll be able to see, the EPYC Genoa I/O Die is just huge in comparison with the Ryzen 7000 variant — that is as a result of AMD can wire as much as 12 compute chiplets (CCDs) to the I/O Die for its EPYC Genoa processors.
In distinction, the buyer chips are restricted to 2 chiplets, an immutable limitation as a result of, as you’ll be able to see in Locuza’s diagram, the Ryzen 7000 I/O Die solely has two World Reminiscence Interconnect 2 (GMI2) hyperlinks that join the compute chiplets to the IOD. That is a bummer — the decrease core-count Genoa fashions with 4 CCDs can have dual-GMI3 hyperlinks (large mode), a brand new functionality that may supply benefits in some reminiscence throughput-intensive duties. That may’ve been attention-grabbing so as to add to the buyer chips.
We have additionally added the total ISSCC 2022 deck beneath to your perusal — it features a few different attention-grabbing tidbits.
Zen 4 Raphael 6 nm shopper I/O die:- 128b DDR5 PHY + 32b for ECC (8b per 32b channel)- 2x GMI3 Ports, 3x CCDs usually are not doable. :p- 28x PCIe 5, Zen1/2/3 cIOD had 32x PCIe lanes. So AMD lowered the waste for the shopper market. – Actually only one RDNA2 WGP, 128 Shader “Cores” https://t.co/bkqdVvhgrn pic.twitter.com/erYxTw1p8hMarch 4, 2023