Nvidia AD102 Die Shot

Why Nvidia’s RTX 4080, 4090 Value so Rattling A lot

Posted on

The launch of the Nvidia RTX 40-series and Ada Lovelace GPUs has been greeted with a spread of feelings. Pleasure from some, disbelief from others, and outright scorn from many. In constructing a GPU that guarantees to outperform the finest graphics playing cards, Nvidia has gone massive — massive and costly. AMD, then again, is utilizing a less expensive expertise that would make its upcoming RDNA playing cards extra interesting, extra inexpensive selection.

Whereas we’re not aware of Nvidia’s Invoice of Supplies (BOM), the excessive value of Nvidia’s GPUs is essentially as a result of firm’s refusal to embrace “Moore’s Legislation 2.0” and look to issues like chiplets. AMD began beating Intel on CPUs when it switched to chiplets, particularly on value, and now they’re about to do the identical for GPUs. Placing the analog reminiscence interfaces on an older course of with RDNA 3 is an excellent strategy, since analog scales very poorly with newer course of nodes. The identical goes for cache. 

Wanting on the AD102 die pictures Nvidia has posted to date, we all know the die dimension is 608mm^2. That is solely barely smaller than GA102 at 628mm^2, however now Nvidia is on a leading edge TSMC 4N course of node as a substitute of Samsung 8N. Pricing on the wafers undoubtedly shot up, and we had a number of (opens in new tab) tales (opens in new tab) prior to now yr about TSMC elevating costs (opens in new tab). That is all coming to bear.

The AD102 die pictured beneath — which is a rendering nevertheless it’s the most effective we have got — exhibits some clear particulars of how and why Nvidia’s newest tour de power prices greater than the earlier era chips. Flip by means of the gallery for annotated variations of the picture.

The twelve Graphics Processing Clusters (GPCs) are simply distinguishable from the remainder of the chip, and every of these has 12 Streaming Multiprocessors (SMs). All the GPCs and SMs collectively take up about 45% of the full die space. The place does the remaining go?

The twelve 32-bit GDDR6X reminiscence controllers dissipate a lot of the outdoors fringe of the die, with the PCIe x16 connector utilizing a few third of the underside edge. The reminiscence controllers and associated circuitry take up a hefty 17% of the die space, give or take. However that is not the one a part of the reminiscence subsystem, as Nvidia has a a lot bigger L2 cache than on earlier designs.

You may see the six 16MB chunks of L3 within the middle part of the die, with some associated routing and different circuitry (ROPs?) round it. The L2 cache blocks are not less than 15% of the full die space, whereas your complete middle portion of the die (L2 plus different logic) takes up 25% of the full. The rest of the die on the backside is devoted to issues like the twin NVENC encoders, the PCIe interface and bodily show interfaces. It is about 7% of the full, after which there are just a few different miscellaneous bits scattered round that take up the final ~6% of the die.

Nvidia AD102 RTX 4090 block diagram

(Picture credit score: Nvidia)

The purpose of discussing these die areas is to assist put issues in perspective. Nvidia, with its monolithic strategy on the AD102 chip, has devoted roughly 33% of the full die space simply to reminiscence interfaces and L2 cache. AMD’s MCD (Reminiscence Chiplet Die) strategy used with its Radeon RX 7000-series and RDNA 3 GPUs will apparently transfer almost all of that off the principle chiplet, and it’ll reportedly use TSMC N6 as a substitute of TSMC N5, lowering value and bettering yields on the similar time.

TSMC would not reveal its contract negotiations with massive companions like Apple, AMD, Intel, or Nvidia. Nonetheless, there are studies that TSMC N5 (and thus 4N, which is kind of simply “refined” N5) prices not less than twice as a lot as TSMC N7/N6. With a 608mm^2 die dimension for AD102, Nvidia can solely get round 90 full dies per wafer — and for reference, that is solely about two extra chips per wafer than GA102.

If TSMC 4N prices greater than twice as a lot per wafer as Samsung 8N, meaning AD102 prices greater than twice as a lot per chip because the earlier era GA102 and the RTX 3090. Gordon Mah Ung of PC World requested Nvidia CEO Jensen Huang throughout a Q&A session about pricing. I will go forward and quote it on to put issues in context.

Gordon: [RTX] 4000 is lastly right here, which for you I am positive looks like an enormous launch. The response universally I am seeing out there’s, “Oh, my God. It prices a lot cash.” Is there something you wish to say to the neighborhood relating to pricing on the brand new generational elements? In addition to, can they anticipate to see higher pricing in some unspecified time in the future and mainly tackle all of the loud screams that I am seeing in every single place?”

Jensen: “To begin with, a 12 inch wafer is much more costly as we speak than it was yesterday. And it is not somewhat bit dearer, it’s a ton dearer. Moore’s legislation is useless. And the power for Moore’s legislation to ship the identical efficiency, half the price yearly and a half is over. It is utterly over. And so the concept that the chip goes to go down in value over time, sadly is a narrative of the previous.” (Emphasis added.)

In fact, there’s much more to constructing a graphics card than simply the GPU. Reminiscence, PCB, VRMs, PMICs, capacitors, and all types of different bits are concerned. Costs on a lot of these have elevated over the previous two years as properly. Nvidia additionally has to place a number of effort into the analysis and growth of GPUs and associated applied sciences. And the ultimate design has to take all of that into consideration after which in the end flip right into a profitable product.

So how a lot does a brand new half have to value as a way to make for a worthwhile product? That is harder to say.

Nvidia Ada Overview Slides

(Picture credit score: Nvidia)

Yet one more attention-grabbing factor concerning the RTX 40-series announcement is that Nvidia has revealed three completely different graphics card fashions, and each makes use of a special GPU. Once more, that type of strategy has to extend prices, and it means Nvidia additionally wants to determine methods to finest allocate its wafer orders. The AD102 chip within the RTX 4090 is the brand new halo half with a big die and all of the trimmings. AD103 cuts down the reminiscence interface and the core counts, after which AD104 cuts them down even additional.

Nvidia hasn’t launched die pictures or renderings of AD103 and AD104 simply but, however we do have the total specs. They’re fairly a bit smaller, and far of that comes from lowering core counts, reminiscence interfaces, and L2 cache dimension. The 4080 fashions will naturally be increased quantity merchandise than the 4090, although it is value declaring that the 4090 doubtlessly has 70% extra compute, 50% extra reminiscence bandwidth and capability, and makes use of 41% extra energy, all whereas “solely” costing 33% extra. In different phrases, RTX 4080 16GB pricing is proportionately worse than the RTX 4090.

We will do the identical for the RTX 4080 12GB. The 4080 16GB presents 21% extra compute, 33% extra reminiscence capability, 42% extra reminiscence bandwidth, however solely makes use of 12% extra energy. It additionally prices 33% extra. Each of the RTX 4080 fashions look overpriced and underpowered in comparison with what we have seen in earlier Nvidia architectures, the place the halo playing cards value considerably extra whereas solely reasonably rising efficiency.

Nvidia RTX 30-series cards

(Picture credit score: Tom’s {Hardware})

When RTX 30-series launched, Nvidia began with the RTX 3090 and 3080. Each used the GA102 chip, simply with fewer cores enabled on the 3080. Subsequent got here the RTX 3070 and 3060 Ti, each of which used the GA104 chip. Finally, Nvidia would add GA106 to the household, used within the RTX 3060 and 3050 — and there was GA107 for the cellular RTX 3050 Ti and 3050, however that by no means got here to desktops. Finally, simply wanting on the desktop playing cards, Nvidia had three completely different GPUs that powered ten completely different graphics playing cards. Now, Nvidia has introduced three playing cards utilizing three GPUs, and it has to determine methods to steadiness the variety of every chip.

I am unable to assist however surprise what the total RTX 40-series product stack will seem like by the tip of 2023. Phrase is that there are nonetheless two extra Ada GPUs in growth (AD106 and AD107). In some unspecified time in the future, Nvidia will begin getting extra binned chips that may operate as one thing apart from the three introduced SKUs, after which we’ll possible begin seeing extra graphics card fashions.

AMD in distinction seems set to announce maybe a single core GPU on November 3, which can use chiplets. Present info says the GCD (GPU Chiplet Die) will measure simply 308mm^2, about half the scale of AD102, and it’ll hyperlink up with as much as six MCDs (Reminiscence Chiplet Dies) which might be all comparatively small (38mm^2). That is about the identical dimension as AD104 (294.5mm^2), and if rumors are right, AMD’s Navi 31 can be packing as much as 12,288 GPU shader cores — 60% greater than Nvidia’s RTX 4080 12GB in roughly the identical dimension chip.

AMD may launch RX 7900 XT, RX 7800 XT, and perhaps even RX 7800 utilizing the identical GCD, solely with completely different numbers of GPU cores enabled and with 6, 5, or 4 MCDs. Yields could be considerably higher than AD102, and prices would even be a lot decrease. AMD may even have the ability to compete with AD104 on pricing whereas delivering considerably increased efficiency, not less than in video games that do not leverage DLSS 3 and/or excessive ray tracing results. Benefit: AMD.

AMD RDNA 3 demo

(Picture credit score: AMD)

There’s additionally the query of why the RTX 4080 12GB is not simply known as the RTX 4070. Speaking with Nvidia throughout a briefing, this actual query got here up: What was the thought course of behind calling the 12GB chip a 4080 as a substitute of a 4070, particularly since it is a completely different chip?

Nvidia’s Justin Walker, Senior Director of Product Administration, stated, “The 4080 12GB is a extremely excessive efficiency GPU. It delivers efficiency significantly quicker than a 3080 12GB… it is quicker than a 3090 Ti, and we actually suppose it is deserving of an 80-class product.”

Frankly, that is a crap reply. In fact it is quicker! It is a new chip and a brand new structure; it is speculated to be quicker. Keep in mind when the GTX 1070 got here out and it was quicker than a 980 Ti? I assume that wasn’t “deserving” of an 80-class product title. Neither was the RTX 2070 when it matched the 1080 Ti, or the 3070 when it matched the 2080 Ti.

However then we get the efficiency comparisons the place Nvidia says the 4080 12GB can be “as much as 3x quicker than the 3080 12GB.” And that is the place it’s important to begin to surprise, as a result of clearly that is with DLSS 3, in heavy ray tracing video games. What’s going to occur once you’re not taking part in video games that meet these standards?

GeForce RTX 4090

(Picture credit score: Nvidia)

Primarily based on Nvidia’s benchmarks, it’ll be a blended bag. The primary three video games within the above chart on the left do not use DLSS, or DLSS 3. The RTX 4080 12GB is commonly tied with or barely slower than an RTX 3090 Ti when DLSS 3 and ray tracing aren’t a part of the equation. How usually will that be the case on future video games is way harder to foretell.

In a number of methods, the RTX 40-series launch to date feels very harking back to the RTX 20-series launch. Nvidia is as soon as once more hyping up ray tracing and DLSS, solely we’re now in spherical three of that story. The RT {hardware} is way extra succesful, DLSS 3 is meant to be lots higher as properly, however will all the massive video games assist each applied sciences to ample ranges? Undoubtedly the reply is not any; some will, some will not.

In the meantime, generational pricing has elevated (once more), and the specs on among the fashions actually look questionable. RTX 4080 12GB feels far an excessive amount of prefer it actually ought to have been the RTX 4070 proper now, and Nvidia may have began tacking on Ti and Tremendous or no matter to create different fashions.

The RTX 3080 10GB will apparently nonetheless linger on with a $699 MSRP in the intervening time. That actually cannot final, not when an eventual RTX 4070 will inevitably displace it on efficiency and options. However Nvidia and its companions want the unwitting to purchase up the prevailing stock of RTX 30-series playing cards, for the very best costs they’ll nonetheless get, earlier than they’re prepared to maneuver down the stack to the remainder of the Ada Lovelace lineup.

Supply hyperlink

Leave a Reply

Your email address will not be published. Required fields are marked *