AMD talked about its newest CDNA roadmap at its monetary analysts day, and the massive information for information heart graphics is the upcoming CDNA 3 structure and the MI300 APU. Sure, you noticed that proper: AMD shall be making a full-blown APU that mixes CPU and GPU chips in a single product.
Beginning with CDNA 3, AMD claims it’s going to ship greater than a 5X efficiency per watt (perf/watt) improve over the present CDNA 2 merchandise. Which may seem to be an unimaginable feat, however digging a bit deeper suggests a simple manner AMD may get to that determine. Not like its client graphics merchandise, CDNA accommodates matrix cores (just like Nvidia’s tensor cores). The Intuition MI250X presently delivers as much as 95.7 teraflops of peak FP64 matrix operations, or 4 occasions that determine for peak FP16 and bfloat16 throughput of 383 teraflops. The catch is that MI250X has the identical 383 teraflops for INT8 and INT4 efficiency.
It is a protected guess that AMD’s claims of 5X the perf/watt aren’t a common enhance in effectivity for FP64, FP32, and different codecs. Extra probably is that AMD will increase the INT8 throughput to being double the FP16/bfloat16 fee, and INT4 may double that once more. That would supply a 4X enchancment, and architectural enhancements of 25% would get that to 5X. AMD additionally mentions new math codecs, which might go together with that increase in efficiency. After all, perf/watt is all the time a nebulous metric anyway, so file this away as an “as much as 5X or extra” enchancment and we’ll wait to see what the precise efficiency appears like.
In addition to enhancements in efficiency and effectivity, CDNA 3 will comprise a 4th technology Infinity Material and a subsequent technology Infinity Cache. As anticipated, CDNA 3 will use a 5nm course of know-how, probably TSMC N5 or N5P. That ought to assist with reaching the opposite targets for the design.
CDNA 3 can even transfer from a coherent reminiscence structure utilized in CDNA 2 to a unified reminiscence structure with CDNA 3. This can be a crucial enchancment, as loads of the facility utilized in information heart workloads goes to transferring information round. Lowering the necessity for redundant copies can significantly enhance total effectivity, which brings up the following level.
AMD’s Intuition MI300 resolution will function each CPU and GPU chiplets within the packaging. AMD calls this the primary information heart APU, accelerated processing unit. That is fascinating as AMD hasn’t used the time period APU as a lot in recent times with its Ryzen options which have built-in graphics. Maybe there shall be a resurgence of APU branding, however the mixture of Zen 4 CPU cores and CDNA 3 GPU cores ought to show extremely potent.
MI300 will function superior packaging that places CPUs, GPUs, cache, and HMB collectively on a single bundle. The sooner slide exhibits a bundle with what seem like 4 CPU/GPU chiplets paired with HBM. Given the deal with computational throughput, if that rendering in all fairness correct, we suspect AMD will use three GPU chiplets with a single CPU chiplet, however AMD hasn’t stated a technique or one other.
This has some fascinating implications for AMD’s upcoming supercomputer endeavors, as MI300 will probably be the primary engine behind the El Capitan system. The place Frontier makes use of Zen 3 EPYC “Trento” processors, and hyperlinks every 64-core CPU with 4 MI250X GPUs, El Capitan might find yourself being fairly completely different. Utilizing an exterior Zen 4 CPU after which combining that with 4 MI300 APUs that every comprise a CPU appears pointless. As an alternative, El Capitan may merely have a 1U blade that packs in as many MI300 APUs as will match.
AMD says the ensuing MI300 will ship an 8X enhance in AI coaching efficiency versus MI250X, and once more that probably goes again to enhancements in INT8 and INT4 throughput, mixed with extra GPU cores normally. MI250X accommodates a pair of graphics compute dies (GCD) in a bundle, and MI300 appears like it might have three CDNA 3 GCDs alongside a Zen 4 CPU die. That is 50% extra graphics potential, plus the architectural enhancements.
AMD did not reveal something past CDNA 3 in its roadmap, although CDNA 4 will nearly definitely observe. The GPU roadmap did point out RDNA 4, with a 2024 launch time-frame, and CDNA 4 would probably observe a 12 months later. We’ll let AMD get CDNA 3 out the door first, and look ahead to listening to further particulars on the design and structure within the coming months.