Intel formally launched its fourth-gen Xeon Scalable Sapphire Rapids CPUs, in each the common and HBM-infused Max flavors, together with its “Ponte Vecchio” Knowledge Middle GPU Max Collection in the present day, unveiling an expansive portfolio of 52 new CPUs that can face off with AMD’s EPYC Genoa lineup that debuted final 12 months. Intel additionally slipped in a low-key announcement of what’s going to purportedly be its final line of Optane Persistent Reminiscence DIMMs.
Whereas AMD’s chips keep the core rely lead with a most of 96 cores on a single chip, Intel’s Sapphire Rapids chips deliver the corporate as much as a most of 60 cores, up from the earlier peak of 40 cores with the third-gen Ice Lake Xeons. Intel claims this may result in a 53% enchancment on the whole compute over its prior-gen chips.
Sapphire Rapids additionally leans closely into new acceleration applied sciences. These new purpose-built accelerator areas of the chip are designed to radically increase efficiency ins a number of kinds of work, like compression, encryption, information motion, and information analytics, that usually require discrete accelerators for max efficiency. Intel claims a median of a 2.9X enchancment in performance-per-watt over its previous-gen fashions in some workloads when using the brand new accelerators. Intel additionally claims a 10X enchancment in AI inference and coaching and a 3X enchancment in information analytics workloads.
Intel’s Sapphire Rapids, which comes fabbed on the ‘Intel 7’ course of, additionally brings a bunch of latest connectivity applied sciences, like help for PCIe 5.0, DDR5 reminiscence, and the CXL 1.1 interface (kind 1 and a couple of units), giving the corporate a firmer footing in opposition to AMD’s Genoa. We’re onerous at work benchmarking the chips for our full evaluate that we’ll submit within the coming days, however within the interim, this is a quick overview of the brand new lineup.
Intel 4th-Gen Xeon Scalable Sapphire Rapids Pricing and Specs
Intel’s Sapphire Rapids product stack spans 52 fashions carved up into efficiency and mainstream dual-socket general-purpose fashions, together with specialised fashions for liquid-cooled, single-socket, networking, cloud, HPC, and storage and HCI methods.
These chips are then carved up into numerous Max, Platinum, Gold, Silver, and Bronze sub-tiers, with every denoting numerous ranges of socket scalability, help for Optane persistent reminiscence, RAS options, SGX enclave capability, and the like.
The Sapphire Rapids chips additionally now include a various variety of enabled accelerator units onboard, which differ by SKU. We’ll dive into the several types of accelerators under. For now, it is necessary to know that every chip can have a numerous variety of accelerator ‘units’ enabled (listed within the spec sheet above), with a number of units for every kind of accelerator accessible per chip (consider the ‘units’ as akin to ‘cores’).
Customers will be capable to purchase chips which are absolutely loaded with all 4 units for all 4 kinds of accelerators enabled, or they will go for cheaper chip fashions with a decrease variety of enabled units after which activate them later by way of a brand new pay-as-you-go mechanism referred to as Intel on Demand. The ‘+’ fashions have at the least one accelerator of every kind enabled by default, however there are two courses of chips with two completely different allocations of accelerators. We cowl these particulars within the subsequent part.
The brand new processors all help AVX-512, Deep Leaning Enhance (DLBoost), and the brand new Superior Matrix Extensions (AMX) directions, with the latter delivering explosive efficiency uplift in AI workloads by utilizing a brand new set of two-dimensional registers referred to as ’tiles.’ Intel’s AMX implementation will primarily be used to spice up efficiency in AI coaching and inference operations.
As earlier than, Intel’s 4th-Gen Xeon Scalable platform helps 1, 2, 4, and 8-socket configurations, whereas AMD’s Genoa solely scales to 2 sockets. AMD leads in PCIe connectivity choices, with as much as 128 PCIe 5.0 lanes on supply, whereas Sapphire Rapids peaks at 80 PCIe 5.0 lanes.
Sapphire Rapids additionally helps as much as 1.5TB of DDR5-4800 reminiscence unfold throughout eight channels per socket, whereas AMD’s Genoa helps as much as 6TB of DDR5-4800 reminiscence unfold throughout 12 channels. Intel has spec’d its 2DPC (DIMMs per Channel) configuration at DDR5-4400, whereas AMD has not completed qualifying its 2DPC switch charges (the corporate expects to launch the 2DPC spec this quarter).
The Sapphire Rapids processors span from eight core fashions as much as 60 cores, with pricing starting at $415 and peaking at $17,000 for the 8490H. The flagship Xeon Scalable Platinum 8490H comes with 60 cores and 120 threads with all 4 accelerator sorts absolutely enabled. The chip additionally has 112.5 MB of L3 cache and a 350W TDP score. The 350W score is considerably larger than the 280W peak with Intel’s previous-gen Ice Lake Xeon sequence, however the inexorable push for extra efficiency has the trade at massive pushing to larger limits. For example, AMD’s Genoa tops out at the same 360W TDP, albeit for a 96-core mannequin, and may even be configured as a 400W chip. Sapphire Rapids spans from 120W to 350W.
The 8490H is the lone 60-core mannequin, and it’s only accessible with the entire acceleration engines enabled. Stepping again to the 56-core Platinum 8480+ will value you $10,710, however that comes with solely considered one of every kind of acceleration gadget lively. This processor comes with a 3.8 GHz increase clock, 35W TDP, and 105MB of L3 cache.
Intel Xeon Sapphire Rapids Accelerators
Intel’s new on-die accelerators are a key new element for its Sapphire Rapids processors. As talked about above, clients can both buy chips with the entire accelerator choices activated or go for cheaper fashions that can permit them to buy accelerator licenses as wanted via the Intel On Demand service. Not all chips have the identical accelerator choices, which we’ll cowl under.
Intel hasn’t offered a pricing information for the accelerators but, however the licenses will probably be offered via server OEMs and are activated by way of software program and a licensing API. As a substitute of shopping for a full license outright, you may also go for a pay-as-you-go characteristic with utilization metering to measure how a lot of a service you employ.
The thought behind this service is to permit clients to activate and pay for less than the options they want, and in addition to offer a future improve path that does not require shopping for new servers or processors. As a substitute, clients might choose to make use of acceleration engines to spice up efficiency. This additionally permits Intel and its companions to carve a number of kinds of SKUs from the identical useful silicon, thus simplifying provide chains and lowering their prices.
These features symbolize Intel’s continuation of its lengthy historical past of bringing fixed-function accelerators onto the processor die. Nonetheless, the highly effective items on Sapphire Rapids would require software program help the extract the complete efficiency capabilities. Intel is already working with a number of software program suppliers to allow help in a broad vary of functions, a lot of which you’ll see within the album above.
Intel has 4 kinds of accelerators accessible with Sapphire Rapids. The Knowledge Streaming Accelerator (DSA) improves information motion by offloading the CPU of data-copy and data-transformation operations. The Dynamic Load Balancer (DLB) accelerator steps in to offer packet prioritization and dynamically steadiness community visitors throughout the CPU cores because the system load fluctuates.
Intel additionally has an In-Reminiscence Analytics Accelerator (IAA) that accelerates analytics efficiency and offloads the CPU cores, thus bettering database question throughput and different features.
Intel has additionally introduced its Fast Help Know-how (QAT) accelerators onboard the CPU — this perform used to reside on the chipset. This {hardware} offload accelerator augments cryptography and compression/decompression efficiency. Intel has employed QAT accelerators for fairly a while, so this know-how already enjoys broad software program help.
The Sapphire Rapids processors are comprised of two kinds of designs (Die Chops), as listed within the SKU desk. The XCC chips are comprised of 4 whole die, and every die has considered one of every accelerator (IAA, QAT, DSA, DLB). Meaning you possibly can activate a most of 4 accelerators of every kind on these chips (for instance, 4 IAA, 4 QAT, 4 DSA, 4 DLB).
In distinction, a few of the chips use a single MCC die, which solely comes with one IAA and DSA accelerator and two QAT and DLB accelerators. Meaning you possibly can solely activate 2 QAT and DLB accelerators and one IAA and DSA accelerator by way of the On Demand characteristic.
Intel not too long ago introduced a number of particulars about its forthcoming Xeon Max Collection of CPUs and the Intel Knowledge Middle GPU Max Collection (Ponte Vecchio). Right this moment marks the formal launch.
Intel’s HBM2e-equipped Max CPU fashions come to market with 32 to 56 cores and are primarily based on the usual Sapphire Rapids design. These chips are the primary x86 processors to make use of HBM2e reminiscence on-package, thus offering a bigger 64GB pool of native reminiscence for the processor. The HBM reminiscence will assist with memory-bound workloads that are not as delicate to core counts, so the Max fashions include fewer cores than customary fashions. Goal workloads embrace computational fluid dynamics, local weather and climate forecasting, AI coaching and inference, huge information analytics, in-memory databases, and storage functions.
The Max CPUs can function in a mess of varied configurations, comparable to with the HBM reminiscence used for all reminiscence operations (HBM solely – no DDR5 reminiscence required), in an HBM Flat Mode that presents the HBM as a separate reminiscence area (this requires in depth software program help), or in an HBM Caching Mode that makes use of the HBM2e as a DRAM-backed cache. The latter requires no code modifications and can doubtless be probably the most continuously used mode of operation.
The Xeon Max CPUs will sq. off with AMD’s EPYC Milan-X processors, which include a 3D-stacked L3 cache referred to as 3D V-Cache. The Milan-X fashions have as much as 768MB of whole L3 cache per chip that delivers an unbelievable quantity of bandwidth, but it surely does not present as a lot capability as Intel’s method with HBM2e. Each approaches have their respective strengths and weaknesses, so we’re keen to place the Xeon Max processors to the check.
Intel additionally launched its Max GPU Collection, beforehand code-named Ponte Vecchio. Intel had beforehand unveiled the three completely different GPU fashions, which are available each customary PCIe and OAM type elements. You may learn extra in regards to the Max GPU Collection right here.
Intel Optane Persistent Reminiscence (PMem) 300
As a part of its Sapphire Rapids launch, Intel quietly launched what’s the remaining sequence of Optane Persistent Reminiscence DIMMs. The ultimate era, codenamed Crow’s Move however formally generally known as the Intel Optane Persistent Reminiscence 300, will are available 128, 256, and 512 GB capacities and function at as much as DDR5-4400. That is an enormous step over the earlier peak of DDR4-3200, but it surely additionally implies that Sapphire Rapids methods must downclock the usual reminiscence from the supported DDR5-4800 to DDR5-4400 in the event that they plan on using Optane.
Intel cites 56% extra sequential bandwidth and 214% extra bandwidth in random workloads, together with help for as much as 4TB of Optane per socket, or 6TB whole for a system. Similar to the previous-gen Optane 200 sequence, the DIMMs function at 15W. Nonetheless, they now step as much as a DDR-T2 interface and AES-XTS 256-bit encryption.
At its debut in 2015, Intel and companion Micron touted the underlying tech, 3D XPoint, as delivering 1000x the efficiency and 1000x the endurance of NAND storage, and 10x the density of DRAM. Intel had already stopped producing its Optane storage merchandise for consumer PCs, which is smart as it’s promoting its NAND enterprise to SK hynix. Nonetheless, Intel retained its reminiscence enterprise for the info heart, together with its persistent reminiscence DIMMs that may perform as an adjunct to fundamental reminiscence — a functionality solely Intel affords. Now these merchandise can even not see any future generations after the next-gen Crow Move modules that arrive with Sapphire Rapids processors.
Intel cites an trade shift to CXL-based architectures as a motive for winding down the Optane enterprise, mirroring Intel’s ex-partner Micron’s sentiments when it exited the enterprise final 12 months. Sapphire Rapids helps each Optane DIMMs and CXL, however this will probably be one of many final instances the 2 are seen collectively — CXL would be the trade’s most well-liked methodology of connecting unique reminiscences to chips sooner or later.
We’re presently underway with our testing for our Sapphire Rapids evaluate, so keep tuned for the complete efficiency breakdown and architectural particulars within the coming days.