The AMD RDNA 3 structure has lastly been formally unwrapped, alongside the brand new $999 Radeon RX 7900 XTX and $899 Radeon RX 7900 XT graphics playing cards. These are set to go head-to-head with the finest graphics playing cards, and AMD looks like it may need a official shot on the prime of the GPU benchmarks hierarchy. Right here’s what we all know.
First, a lot of the particulars align with what was already anticipated and lined in our AMD RDNA 3 structure and RX 7000-series GPUs. RDNA 3 will use chiplets, with a important GCD (Graphics Compute Die) and as much as six MCDs (Reminiscence Cache Dies). As well as, there are loads of under-the-hood modifications to the structure, together with extra Compute Models and a lot extra GPU shaders in comparison with the earlier era.
Basically, AMD continues to deal with energy and vitality effectivity and has focused a 50% enchancment in efficiency per watt with RDNA 3 in comparison with RDNA 2. We all know Nvidia’s RTX 4090 and Ada Lovelace pushed far up the voltage and frequency curve, and as we confirmed in our RTX 4090 effectivity scaling, energy limiting the RTX 4090 to 70% vastly boosted Nvidia’s effectivity. Nonetheless, AMD apparently feels no have to dial the ability use as much as 11 at default.
Let’s begin with a fast overview of the core specs, evaluating AMD’s upcoming GPUs with the highest earlier era RDNA 2 and Nvidia’s RTX 4090.
Graphics Card | RX 7900 XTX | RX 7900 XT | RX 6950 XT | RTX 4090 | RTX 4080 | RTX 3090 Ti |
---|---|---|---|---|---|---|
Structure | Navi 31 | Navi 31 | Navi 21 | AD102 | AD103 | GA102 |
Course of Know-how | TSMC N5 + N6 | TSMC N5 + N6 | TSMC N7 | TSMC 4N | TSMC 4N | Samsung 8N |
Transistors (Billion) | 58 | 58 – 1MCD | 26.8 | 76.3 | 45.9 | 28.3 |
Die dimension (mm^2) | 300 + 222 | 300 + 185 | 519 | 608.4 | 378.6 | 628.4 |
SMs / CUs / Xe-Cores | 96 | 84 | 80 | 128 | 76 | 84 |
GPU Cores (Shaders) | 12288 | 10752 | 5120 | 16384 | 9728 | 10752 |
Tensor Cores | N/A | N/A | N/A | 512 | 304 | 336 |
Ray Tracing “Cores” | 96 | 84 | 80 | 128 | 76 | 84 |
Increase Clock (MHz) | 2300 | 2000 | 2310 | 2520 | 2505 | 1860 |
VRAM Pace (Gbps) | 20? | 20? | 18 | 21 | 22.4 | 21 |
VRAM (GB) | 24 | 20 | 16 | 24 | 16 | 24 |
VRAM Bus Width | 384 | 320 | 256 | 384 | 256 | 384 |
L2 / Infinity Cache | 96 | 80 | 128 | 72 | 64 | 6 |
ROPs | 192 | 192 | 128 | 176 | 112 | 112 |
TMUs | 384 | 336 | 320 | 512 | 304 | 336 |
TFLOPS FP32 (Increase) | 56.5 | 43.0 | 23.7 | 82.6 | 48.7 | 40.0 |
TFLOPS FP16 (FP8) | 113 | 86 | 47.4 | 661 (1321) | 390 (780) | 160 (320) |
Bandwidth (GBps) | 960? | 800? | 576 | 1008 | 717 | 1008 |
TDP (watts) | 355 | 300 | 335 | 450 | 320 | 450 |
Launch Date | Dec 2022 | Dec 2022 | Could 2022 | Oct 2022 | Nov 2022 | Mar 2022 |
Launch Worth | $999 | $899 | $1,099 | $1,599 | $1,199 | $1,999 |
AMD has two variants of the Navi 31 GPU popping out. The upper spec RX 7900 XTX card makes use of the absolutely enabled GCD and 6 MCDs, whereas the RX 7900 XT has 84 of the 96 Compute Models enabled and solely makes use of 5 MCDs. The sixth MCD is technically nonetheless current on the playing cards, nevertheless it’s both a non-functional die or probably even a dummy die. Both method, it is going to be fused off, and it isn’t related to the additional 4GB of GDDR6 reminiscence, so there will not be a solution to re-enable the additional MCD.
In comparison with the competitors, the RX 7900 XTX nonetheless technically is available in behind the RTX 4090 in uncooked compute, and Nvidia has much more AI processing energy with its tensor cores. However we additionally need to keep in mind that the RX 6950 XT managed to maintain up with the RTX 3090 Ti at 1080p and 1440p and was solely about 5% behind at 4K. That is regardless of having theoretically 40% much less uncooked compute. So, when the RX 7900 XTX on paper has 32% much less compute than the RTX 4090, we do not truly know what that can imply in the true world of efficiency benchmarks.
Additionally, word that AMD’s presentation says 61 teraflops whereas our determine is 56.5 teraflops. That is as a result of AMD’s RDNA 3 has a cut up clock area for effectivity functions. The entrance finish (render outputs and texturing items, maybe) runs at 2.5 GHz, whereas the shaders run at 2.3 GHz. We used the two.3 GHz worth because the teraflops come from the shaders. In fact, these are “Recreation Clocks,” which, not less than with RDNA 2, have been a conservative estimate of real-world clocks whereas operating precise video games. (That is the identical for Nvidia’s Ada Lovelace and Intel’s Arc Alchemist, which each are likely to run 150–250 MHz greater than the said enhance clock values in our testing.)
AMD additionally has a better enhance clock relative to the Recreation Clock, which is the place it will get the 61 teraflops determine — the enhance clock on the RX 7900 XT is 2.5 GHz. However, once more, we’ll want to check the {hardware} in quite a lot of video games to see the place the precise clocks land. With RDNA 2, we discovered the enhance clocks have been fairly constantly what we noticed in video games, perhaps even a bit low, so think about the 56.5 teraflops determine a really conservative estimate.
In fact, the larger deal is not how RX 7900 XT stacks up towards the RTX 4090 however somewhat the way it will compete with the RTX 4080. It has extra reminiscence and reminiscence bandwidth, plus 16% extra compute. So even when the efficiency per clock on the RDNA 3 shaders dropped a bit (extra on this in a second), AMD appears to be like prefer it needs to be very aggressive with Nvidia’s penultimate RTX 40-series half, particularly because it prices $200 much less.
With the high-level overview out of the way in which, let’s dig into some architectural particulars. Sadly, AMD is conserving some issues beneath wraps, so we’re not totally positive in regards to the reminiscence clocks proper now, and we have requested for extra data on different components of the structure. We’ll fill within the particulars as we get them, however some issues may stay unconfirmed till the RDNA 3 launch date on December 13.
AMD has mentioned quite a bit about vitality effectivity with the previous two generations of RDNA architectures, and RDNA 3 continues that focus. AMD claims as much as a 54% efficiency per watt enchancment in comparison with RDNA 2, which in flip was 54% higher PPW than RDNA. Up to now three generations, AMD’s effectivity has skyrocketed — and that is not simply advertising and marketing communicate.
In the event you have a look at the RX 6900 XT for instance, it is principally double the efficiency of the earlier era RX 5700 XT at 1440p extremely. In the meantime, it consumes 308W in our testing in comparison with 214W on the 5700 XT. In order that’s a 38% enchancment in effectivity, simply selecting the 2 quickest RDNA 2 and RDNA choices on the time of launch.
How does AMD proceed to enhance effectivity? In fact, a giant a part of the most recent soar comes because of the transfer from TSMC N7 to N5 (7nm to 5nm), however the architectural updates additionally assist.
The brand new RDNA 3 unified compute unit has 64 dual-issue stream processors (GPU shaders). That is double the quantity of RDNA 2 per compute unit, and AMD can ship completely different workloads to every SIMD unit — or it could have each engaged on the identical kind of instruction. It is attention-grabbing to notice that the most recent AMD, Intel, and Nvidia GPUs at the moment are all utilizing 128 shaders for every main constructing block — Compute Models (CUs) for AMD, Streaming Multiprocessors (SMs) for Nvidia, and Xe Vector Engines (XVEs) for Intel.
Together with doubling the GPU shaders per CU, AMD has elevated the overall variety of CUs from 80 to 96. Gen over gen, AMD’s Navi 31 has 2.4 occasions as many shaders as Navi 21, and the ability draw solely elevated by 18%.
AMD additionally elevated the efficiency of its AI Accelerators, which it hasn’t actually talked about a lot. We’re unsure in regards to the uncooked compute energy, however we do know that the AI accelerators help each INT8 and BF16 (brain-float 16-bit) operations. So that they’re in all probability not less than partially much like Nvidia’s tensor cores, however the complete variety of supported instruction units aren’t the identical. Regardless, AMD says the brand new AI accelerators present as much as a 2.7x enchancment — double the quantity, extra CUs, and barely greater throughput mixed would get there.
Lastly, AMD says it has optimized its Ray Accelerators and that the RDNA 3 variations can deal with 1.5x as many rays, with new devoted directions and improved BVH (ray/field) sorting and traversal. What meaning in the true world nonetheless is not completely clear, however we undoubtedly count on a big leap in ray tracing efficiency together with improved rasterization efficiency. Will it’s sufficient to catch Nvidia? We’ll have to attend and see.
In addition to the compute items, loads of different areas have acquired important updates with RDNA 3. One huge addition is the AMD Radiance Show Engine, or principally the video output help. As well as, AMD has upgraded its RDNA 3 GPUs with help for DisplayPort 2.1 (principally a rebadging and cleanup of DisplayPort 2.0 — every little thing that was DP2.0 is now DP2.1).
That makes AMD the second GPU firm to help DP2.x, with Intel’s Arc being the primary. Besides Intel solely helps 10 Gbps per lane, or 40 Gbps complete, and DisplayPort 2.1 helps as much as 20 Gbps, or 80 Gbps complete. AMD does not help 20 Gbps both, apparently choosing the 13.5 Gbps intermediate stage of help. That provides AMD as much as 54 Gbps complete bandwidth, which is principally double what you will get from DP1.4a.
With DSC (Show Stream Compression), meaning AMD has the potential to help as much as 480 Hz refresh charges on a 4K monitor, or 165 Hz on an 8K show utilizing its DisplayPort 2.1 ports. And to go together with that, the primary DisplayPort 2.1 displays and TVs ought to begin arriving in early 2023.
AMD has additionally considerably overhauled the media engine with RDNA 3. This was already roughly revealed, however Navi 31 has twin media engines which might be absolutely able to supporting two simultaneous 8K60 streams — both encoding, decoding, or they will staff up and enhance the efficiency of encoding a single stream.
One other replace on the video engine is help for AV1, which now means all three GPU distributors have full {hardware} encode/decode help for AV1. The uptake of AV1 has been a bit sluggish up till now, however hopefully, we’ll see a variety of software program options and streaming companies that can transfer to help AV1 over H.264.
The video engines are clocked greater than earlier than (we’re unsure how a lot greater), and AMD additionally notes that it has AI-enhanced video encode. We’re concerned with seeing what meaning when it comes to high quality and efficiency and will likely be wanting ahead to doing a little video encoding checks as soon as the {hardware} is accessible.
AMD shared some preliminary efficiency figures for the Radeon RX 7900 XTX. As with all manufacturer-provided benchmarks, we won’t vouch for the veracity of the above efficiency claims (but), and there is a good probability AMD has chosen the above video games for a motive. Nonetheless, that motive could be giving individuals a wider cross-sectional view of efficiency, exhibiting a 50% to 70% enchancment relative to the earlier era RX 6950 XT.
If we take these figures at face worth, what would that imply for general efficiency? At 4K extremely, our GPU benchmarks have the RTX 4090 outperforming the RX 6950 XT by 63%. That is for conventional rasterization video games. In our DXR check suite, the RTX 4090 is … okay, that is going to be painful, however at 4K in DXR, the 4090 is 204% quicker. That is with out any type of upscaling. At 1440p, the hole drops a bit to 168% quicker, nevertheless it’s nonetheless a gaping chasm.
50% quicker, and even 60% quicker, would put the RX 7900 XTX fairly near the RTX 4090 in rasterization efficiency. Even with 20% extra Ray Accelerators and with 50% greater efficiency on these Ray Accelerators, that may be, at most, round 80% greater ray tracing efficiency. Nvidia’s RTX 4090 could be about 70% quicker at 4K, or maybe solely 50% quicker at 1440p.
Lastly, AMD talked extra about its FidelityFX Tremendous Decision (FSR) know-how. Proper now, there are a number of variations of FSR obtainable The unique FSR (1.x) makes use of spatial upscaling and may principally be built-in into drivers for issues like Radeon Tremendous Decision (RSR). The newer FSR 2.0 and a pair of.1 in the meantime use temporal upscaling and have comparable inputs to DLSS and XeSS: movement vectors, depth buffer, and present plus earlier body buffers.
AMD at present has over 216 video games and functions that use FSR, however most of these are FSR 1.x implementations — once more, it is easy to combine and open supply, and it has been obtainable for over a 12 months now. FSR 2.0 is way newer, having first arrived in Could 2022. FSR 2.1 tunes the algorithm to assist get rid of ghosting and additional enhance picture high quality, and it is solely in a handful of video games proper now.
Trying ahead, AMD’s FSR continues to achieve traction. We might like to see FSR2 use overtake FSR1, as a result of it provides a better high quality expertise, nevertheless it’s high quality for video games to incorporate each. There are use circumstances (like low-end graphics playing cards and built-in graphics) the place FSR1 may nonetheless be preferable for some customers. However AMD is not finished with FSR.
FSR3 is coming, a while subsequent 12 months. It can look to do some type of body era or interpolation, considerably much like what Nvidia is doing with DLSS 3. AMD hasn’t revealed many particulars, in all probability partially as a result of FSR3 is not even absolutely outlined or completed but, however in early testing it is seeing as much as a 2x enhance in efficiency for GPU restricted video games.
There’s extra to cowl with RDNA 3 and the Radeon RX 7900-series playing cards, however we’re in conferences for the remainder of the day. You possibly can see the total slide deck beneath, which additionally will get into the AMD Benefit program coming to desktops, additional efficiency positive aspects for all-AMD methods and laptops, and extra.
Total, RDNA 3 and the RX 7900 XTX sound extraordinarily promising. Even when AMD cannot fairly match the uncooked efficiency of the RTX 4090, the $999 price ticket completely deserves reward. We’ll see how the {hardware} stacks up in our personal testing as soon as the GPUs launch in December, however AMD appears to be like like it’ll as soon as once more slender the hole between its playing cards and what Nvidia has to supply.
The GPU chiplets method clearly has some benefits as effectively. Nvidia at present has a 608mm^2 chip made on a customized TSMC N4 node (tuned N5), plus the 379mm^2 chip within the RTX 4080, and a 295mm^2 AD104 that can presumable go right into a future RTX 4070. AMD’s Navi 31 GCD principally has the identical dimension modern course of and a die dimension that is much like the AD104, however with MCDs made on a earlier N6 node. It is an apparent pricing benefit, and the RX 7900-series playing cards will give AMD one thing to chew on.
Will it’s sufficient? That is a bit harder to say. We additionally have to consider Nvidia’s extras, like DLSS help. FSR and FSR2 work on “every little thing,” roughly, so AMD’s work advantages AMD, Intel, and Nvidia GPU house owners. That is good, but when a sport you wish to play helps DLSS, it is typically a greater picture high quality than FSR2 (and undoubtedly higher than FSR1).
Maybe the RTX 4080 12GB was canceled due to how badly it will have finished towards a equally priced RX 7900 XT. We’ll nonetheless see that GPU, perhaps in January, and hopefully at a cheaper price. If not, the potential to get extra efficiency and extra VRAM for a similar value will certainly favor AMD.