Fashionable compute GPUs are tailor-made to ship unimaginable efficiency at any price, so their energy consumption and cooling necessities are fairly monumental. Nvidia’s newest H100 compute GPU based mostly on the Hopper structure can eat as much as 700W in a bid to ship as much as 60 FP64 Tensor TFLOPS, so it was clear from the beginning that we have been coping with a somewhat monstrous SXM5 module design. But, Nvidia has by no means demonstrated it up and shut.
Our colleagues from ServeTheHome, who have been fortunate sufficient to go to considered one of Nvidia’s workplaces and see an H100 SXM5 module themselves, on Thursday printed a photograph of the compute GPU. These SXM5 playing cards are designed for Nvidia’s personal DGX H100 and DGX SuperPod high-performance computing (HPC) techniques in addition to machines designed by third events. These modules won’t be obtainable individually in retail, so seeing them is a uncommon alternative.
Nvidia’s H100 SXM5 module carries a fully-enabled GH100 compute GPU that includes 80 billion transistors and packing 8448/16896 FP64/FP32 cores in addition to 538 Tensor cores (see particulars about specs and efficiency of H100 within the tables under). The GH100 GPU comes with 96GB of HBM3 reminiscence, although due to ECC help and another elements, customers can entry 80GB of ECC-enabled HBM3 reminiscence related utilizing a 5120-bit bus. The actual GH100 compute GPU pictured is A1 revision marked as U8A603.L06 and packaged on the 53rd week of 2021 (i.e., from December 28 to December 31).
Nvidia’s GH100 measures 814mm^2, which makes it one of many largest chips ever made. Actually, die sizes of Nvidia’s latest compute GPUs have been primarily restricted by reticle measurement of contemporary semiconductor manufacturing instruments, which is round 850mm^2. For the reason that chip made utilizing a personalized TSMC N4 course of know-how (which belongs to the N5 household of nodes) consists of 80 billion transistors working at round 1.40 ~ 1.50 GHz, the GPU is extraordinarily energy hungry. Nvidia charges its thermal design energy at 700W (but this quantity can change), so it requires an especially refined voltage regulating module (VRM) that may ship sufficient energy to feed the beast.
Certainly, the H100 SXM5 module comes with a VRM that has 29 excessive present inductors every geared up with two energy levels in addition to three inductors with one energy stage. The inductors can survive excessive temperatures for extended durations of time and so they are available in metallic shells to make VRM cooling simpler.
Dimensions of the SXM5 module are unknown, however they hardly differ considerably from previous-generation Nvidia modules for compute GPUs. In the meantime, Nvidia modified the connector format for SXM5 (test it out at ServeTheHome), in all probability due to increased energy consumption and sooner PCIe Gen5 and NVLink knowledge charges supported by its GH100.
Nvidia will begin business shipments of its Hopper H100 compute GPUs generally within the second half of this yr and that is when it publicizes ultimate specs of those merchandise and their ultimate TDP.