Intel’s keynote on the Worldwide Supercomputing Convention got here with a brand new roadmap as it really works in the direction of its daunting objective of delivering Zettascale-class efficiency by 2027. As you’ll be able to see in Intel’s Tremendous Compute Silicon Roadmap above, right this moment’s bulletins embody the primary particulars of Intel’s Rialto Bridge GPUs, the subsequent technology of its yet-to-be-launched Ponte Vecchio GPUs. Rialto Bridge will sport as much as 160 cores fabbed on a more recent course of node, comes with an clearly heavily-reworked structure, operates at as much as 800W, delivers as much as 30% extra efficiency in purposes, and begins sampling in mid-2023.
Intel additionally shared extra particulars concerning the Falcon Shores XPU, a chip that can mix a various variety of compute tiles with x86 cores, GPU cores, and reminiscence in a dizzying variety of attainable configurations in terms of market within the 2024 timeframe.
We now even have the primary benchmarks of Intel’s HBM-equipped Sapphire Rapids server chips working their strategy to market to cope with AMD’s Milan-X processors. Intel claims these chips provide as much as thrice the efficiency of their Ice Lake Xeon predecessors.
Delivering on Intel’s Zettascale objective would require a sequence of developments, a lot of them revolutionary, and right this moment the corporate shared a few of its nearer-term targets whereas additionally sketching out the broader long-term plan. Let’s dive into the bulletins.
Intel Rialto Bridge GPU and XPU Supervisor
Intel is sticking with naming its enterprise-class GPUs after Italian bridges, with the current-gen Ponte Vecchio to be adopted by Rialto Bridge, Intel’s next-gen GPU that can come to market in 2023. Intel divulged that this chip will function as much as 160 Xe cores, a considerable improve over the 128 cores current on Ponte Vecchio.
As we are able to see, whereas the Ponte Vecchio design consisted of 16 complete compute tiles organized in two banks that run down the middle of the chip, with eight cores per tile, whereas Rialto Bridge solely has eight longer tiles with (presumably) 20 Xe cores apiece, signifying a big design shift. We additionally see that the Rambo Cache tiles have been eliminated, although there are nonetheless eight HBM tiles of an unknown taste flanking the cores whereas two Xe Hyperlink tiles are organized at opposing corners of the chip package deal. (Word: stand by for some comparability photos that we’ll add shortly.)
Rialto Bridge comes with a more recent unspecified course of node and architectural enhancements, just like a “tick,’ that confer as much as a 30% efficiency enchancment in purposes over Ponte Vecchio. Intel hasn’t offered any benchmarks to again up these claims but.
Rialto Bridge may even have elevated peak energy consumption of as much as 800W, a rise over Ponte Vecchio’s 600W peak, and will probably be accessible within the OAM kind issue. Intel says it should undertake the OAM 2.0 spec, although it should additionally proceed to supply its GPUs in different kind components. The corporate will quickly launch its XPU Supervisor, an open-source monitoring and administration software program for its knowledge middle GPUs that can be utilized each domestically and remotely.
In any other case, Intel is simply sharing hazy particulars about this new GPU, utilizing claims like ‘extra FLOPs,’ ‘elevated I/O bandwidth,’ and ‘Extra GT/s’ that does not actually give us any perception into the brand new design. The corporate did embody an IDM 2.0 itemizing within the slide, indicating that it’s going to proceed to make use of foundry companions for a number of the tiles. We’re positive to study extra quickly, although — Intel says Rialto Bridge will arrive in 2023.
Intel Falcon Shores XPU
Intel’s Falcon Shores XPU represents the continuation of the corporate’s heterogeneous structure design arc with the tip objective of delivering 5X the efficiency per watt, 5X the compute density in an x86 socket, and 5X the reminiscence capability and bandwidth of present server chips.
This disaggregated chip design may have separate tiles of x86 compute cores and GPU cores, however these tiles can be utilized to create any combination of the 2 components, like an all-CPU mannequin, an all-GPU mannequin, or a blended ratio of the 2. Intel hasn’t specified, however it’s also possible to count on that the x86 core tiles may have their very own combination of Efficiency cores (P-cores) and Effectivity cores (E-cores), or we may see clusters of P- and E-cores deployed as full tiles of their very own. Intel notes that these tiles will probably be fabbed on an unspecified Angstrom-era course of node, although Intel’s 20A appears to suit the invoice for the tiles it may fab itself.
Intel may even have smaller tiles for varied flavors of HBM reminiscence and networking components. The versatile ratio of CPU, GPU, reminiscence, and networking performance will enable Intel to rapidly regulate its Falcon Shores SKUs late within the design course of for particular or rising workloads, an essential consideration as a result of rapidly-shifting panorama within the AI/ML house. Intel hasn’t specified whether or not or not it should enable clients to combine and match to create their very own most well-liked combination of tiles, however this might match effectively with the corporate’s Intel Foundry Companies (IFS) method that can see it fabricate chips for different corporations.
The second slide within the above album exhibits varied combos of a four-tile design that comes with x86 compute cores and Xe GPU cores, together with 4 smaller tiles that presumably maintain reminiscence and networking chips.
Naturally, this design will enable Intel to leverage its IDM 2.0 mannequin, thus producing a few of its personal tiles for sure capabilities, whereas additionally contracting with third-party fabs and IP suppliers for some tiles in a mix-and-match style that might sidestep any potential fabrication points with both its personal Angstrom-class course of node tech, or that of its suppliers. Intel will leverage next-gen superior packaging to ship ‘excessive’ bandwidth between the tiles that it’s going to fuse into one cohesive unit. It’s unclear if these chips may have an (lively?) interposer beneath, a lot as we see with the 3D-stacked Foveros chips, or which flavors of Intel’s huge palette of interconnect tech it should use to attach the tiles.
Talking of which, Falcon Shores may have a simplified programming mannequin that Intel says will create a “CPU-like” programming expertise, presumably based mostly upon the corporate’s OneAPI portfolio. Intel expects this product to come back to market within the 2024 timeframe.
Intel Sapphire Rapids HBM Benchmarks
Intel shared benchmarks for its HBM2-equipped Sapphire Rapids processors, which we all know include as much as 64GB of HBM2e reminiscence to spice up efficiency in reminiscence throughput-constrained workloads. As with all vendor-provided benchmarks, take these with loads of salt. We have included the check notes on the finish of the above album.
Intel claims a >2X efficiency achieve over its personal Ice Lake Xeon chip in WRF, a climate forecasting mannequin benchmark that Nvidia lately used to tout its Grace CPU’s good points over Intel. Different highlights embody a claimed >3X enchancment within the YASK power benchmark, a 2X enchancment in openFOAM, and a >3X enchancment in CloverLeaf. Intel additionally claims a 2X speedup in Ansys’ Fluent software program, and a 2X enchancment in ParSeNet.
Intel’s Zettascale Constructing Blocks
Intel’s quest to push ahead from the just-minted Exascale period to the Zettascale period is fraught with challenges given the corporate’s formidable 2027 objective, notably as the corporate has but to launch its personal exascale-class supercomputer. The transfer to Zettascale would require a 1000X improve in efficiency and require new course of tech, architectures, reminiscences, packaging expertise, to not point out networking expertise to tie all of it collectively.
Intel laid out a number of the developments that it feels are wanted to achieve this subsequent stage of computing, with the Common Chiplet Interconnect Categorical (UCIe) spec being chief amongst them. UCIe has the objective of standardizing die-to-die interconnects between chiplets with an open-source design, thus lowering prices and fostering a broader ecosystem of validated chiplets. Ultimately, the UCIe normal goals to be simply as ubiquitous and common as different connectivity requirements, like USB, PCIe, and NVMe, whereas offering distinctive energy and efficiency metrics for chiplet connections.
Intel additionally plans to increase its Extremely-Low Voltage tech, pioneered in its Bitcoin-mining Blockscale ASICs, that gives a 50% clock load voltage discount. Intel additionally envisions optical interconnects being introduced on package deal, with the Xe Hyperlink being an interface that might theoretically be pivoted to optical interconnects to enhance bandwidth, bandwidth density, and cut back energy consumption. All of those components, and extra, will probably be wanted for Intel to satisfy its objective of delivering Zettascale computing energy by 2027.
The Intel keynote is ongoing…updates to come back.