Exascale HPC Tech

CPU Wars and Exascale Clarity: HPC in 2019

CPU Wars and Exascale Clarity: HPC in 2019

The yr forward for top efficiency computing guarantees some fascinating twists and turns.

However we expect two of the extra defining developments for 2019, a minimum of on the hardware aspect, are the start of a realigned processor market and the ultimate dash towards exascale supercomputers.

AMD and Intel Sq. Off in the HPC Area

Let’s begin off with the 800-lb gorilla in the room, Intel. Its chips are used in properly over 90 % of the HPC techniques on the planet, which has been the case because the downfall of the AMD Opteron. Whereas that will not change appreciably in 2019, AMD has its greatest probability in greater than a decade to realize HPC market share at Intel’s expense.

That’s primarily as a result of, for the primary time in historical past, this yr AMD will probably be providing server silicon with smaller transistors than its arch rival. Rome, AMD’s second-generation Epyc processor, will probably be etched with Taiwan Semiconductor Manufacturing Corp’s (TSMC’s), 7nm course of know-how, whereas Intel can be providing its Xeons in the type of Cascade Lake-SP and the HPC-tweaked Cascade Lake-AP chips with 14nm transistors. Intel shouldn’t be anticipated to ship Xeon CPUs with 10nm know-how (which is roughly equal to TSMC 7nm) till 2020. That may give AMD a minimum of a year-long window with a bonus it’s by no means had earlier than.

Typical knowledge says transistor measurement doesn’t matter a lot nowadays; processor design is rather more necessary. However even in the waning days of Moore’s Regulation, fundamentals nonetheless matter. A few of Rome’s improved specs might be definitely be attributed to design innovation, however having smaller transistors to work with gave AMD much more choices.

For instance, the 7nm Rome CPU will sport as much as 64 cores, whereas the core rely for the 14nm Cascade Lake processor will prime out at 48. Extra considerably, AMD is saying Rome might be 4 occasions as quick in the flops division as its first-generation Epyc, whereas Intel is claiming solely a 21 % bump from Skylake-SP to Cascade Lake-AP (with Linpack because the metric). Hold in thoughts that the Skylake Xeon is greater than twice as quick as the unique Epyc at executing floating level operations, but when each distributors’ efficiency claims maintain up for his or her respective next-generation chips, AMD might very will find yourself with the floppiest server CPU in 2019. And its conceivable that these CPUs might even be inexpensive than their slower competitors.

Expectations of AMD’s renewed competitiveness is already being felt. Towards the top of 2018, Rome notched a few massive supercomputer wins. The primary is a 2.Four-petaflop HPE-built system for the Excessive-Efficiency Computing Middle (HLRS) of the College of Stuttgart, which is slated for set up on the finish of this yr. The opposite is a 6.Four-petaflop BullSequana system for the Finnish IT Middle for Science. It’s that’s anticipated to be put in by Atos in 2020. We anticipate Rome will flip up in a rising variety of HPC techniques, giant and small, all through 2019. Until AMD has a serious hiccup, 2019 goes to be a watershed yr for firm in the excessive finish of the market.

Arm will make some inroads in HPC in 2019, however greater than probably they are going to be modest. The primary petascale Arm-powered supercomputer, Astra, garnered a spot on the TOP500 record in November 2018, however was the one Arm machine to take action. Astra, like a handful of different Arm-powered clusters in the sector all depend on Marvell’s (beforehand Cavium’s) ThunderX2 processors. It’s at present the one business Arm implementation designed for HPC techniques and it suffered from an agonizingly sluggish rollout. Initially unveiled in 2016, ThunderX2 wasn’t launched into the wild till 2018.

All of the at present deployed ThunderX2 HPC techniques, even the Astra supercomputer at Sandia Nationwide Laboratories, are, to at least one extent or one other, testbed methods for evaluating Arm operating HPC workloads. It’s solely attainable that the primary manufacturing Arm supercomputer might be Japan’s Submit-Okay supercomputer (see under), which isn’t scheduled to be up and operating till 2021. Within the meantime, we anticipate to see much more Arm-powered methods to be trialed at numerous nationwide labs and analysis institutes in 2019, however nothing on a scale that might change the dominance of x86 on the excessive finish of the market.

That stated, Arm’s long-term prospects in excessive efficiency computing are moderately good. Arm Holdings is taking an lively curiosity in high-end computing by serving to to construct out the software program and hardware ecosystem, whereas HPC distributors like HPE, Atos, and Cray all now supply ThunderX2-powered choices. Plus, because of the licensing mannequin, the know-how is each accessible and malleable, which makes it very appropriate for co-design and extra custom-made implementations for HPC. Simply don’t anticipate any miracles in 2019.

The Market Stakes Out Exascale Territory

With the primary exascale techniques anticipated to return on-line in the subsequent yr or two, supercomputer distributors are gearing as much as get their entries into the sector. In China, Europe and Japan, the builders are just about set. Oddly sufficient, it’s the US distributors which have but to be selected.

In China, Sugon, the Nationwide College of Protection Know-how (NUDT), and the Nationwide Analysis Middle of Parallel Pc Engineering and Know-how (NRCPC) are constructing the primary trio of exascale techniques for that nation. Sugon is in line to develop China’s x86-powered exascale supercomputer, which apparently might be based mostly on a licensed AMD Epyc implementation plus DCU accelerators, each of which will probably be provided by Chinese language chipmaker Hygon. NUDT is liable for Tianhe exascale system, which is predicted to be powered by an Arm processor of some type, probably a Phytium Xiaomi chip, however which may have modified. The third system might be developed by NRCPC and might be based mostly on some future ShenWei processor.

In 2018, all three builders put pint-sized prototypes (512 nodes) of those exascale methods into the sector. The essential componentry in these prototypes – processors, community, and reminiscence – just isn’t exascale-ready, so should undergo a minimum of another iteration. As a consequence, if any of those builders expect to put in a full-up exascale supercomputer in 2020, which was the unique aim of the Chinese language authorities, they’re going to have step up their recreation.

Given the present absence of pre-exascale deployments in the nation – a system that delivers tons of of petaflops — and the shortage of legacy expertise with semiconductor know-how, it appears unlikely that the primary Chinese language exascale machine will come on-line subsequent yr. At this level, 2021 or 2022 looks like a extra affordable goal. By the top of 2019, we should always have a greater sense of the place these tasks stand, assuming in fact that the work continues to be made public.

In some methods, Europe is now following the Chinese language exascale strategy, inasmuch as they’re utilizing the chance to develop their very own home HPC processors. On this regard, the important thing effort is the European Processor Initiative (EPI), a challenge whose aim is to develop home-grown chips for the native market – not only for HPC equipment, but in addition for the car market and the broader server and cloud area.

Regardless that the EPI venture kicked off final June, the technical path already seems to be nicely established. The plan is to develop an Arm chip because the general-purpose processor and use open-source RISC-V structure as the idea for the HPC accelerator. The primary era of those processors are slated to finish up in pre-exascale techniques in 2021 and could possibly be taped out as early as the top of this yr. The second-generation chips will go into exascale techniques which are scheduled for set up in 2023 to 2024. In any case, we should always get some notion of the chip designs for each the Arm and RISC-V implementation in 2019. The latter will symbolize the world’s first implementation of RISC-V for HPC.

The EPI consortium is made up of 23 members, however the important thing business group is Atos, which would be the system integrator and the lead for the event of the general-purpose processor. The Barcelona Supercomputer Middle (BSC) will head up the accelerator work.

In comparison with China or Europe, the plan for Japan’s preliminary exascale supercomputer, often known as Publish-Okay, is downright easy. Fujitsu is the prime contractor for each the system itself and the processor that may energy it. The system is on monitor to be put in at RIKEN in 2021.

Publish-Okay is the exascale replace of the Okay pc, with a spiffed-up Tofu interconnect and a swap-out of Sparc64 processors with Arm silicon. Final yr, Fujitsu unveiled its Arm implementation, the A64FX, a 64-bit Arm8.2-A processor outfitted with the Scalable Vector Extension (SVE) addition that’s particularly designed for HPC obligation. We anticipate the chip will bear yet one more iteration between now and its exascale debut in two years.

Within the US, issues are a bit extra up in the air. Whereas the primary US exascale supercomputer, generally known as Aurora/A21, is already set to be constructed by Intel and Cray for the Division of Power’s Argonne Nationwide Lab in 2021, there’s little or no recognized concerning the particulars of the machine (though we provide some hypothesis right here, right here, and right here). A21 will virtually definitely be a Cray Shasta system of some type, however with what componentry from Intel?

Hopefully, someday in 2019 Intel will reveal at the very least the processor (or processors) used to energy A21 and if it has determined to forge forward with its second-generation Omni-Path interconnect. The corporate’s roadmaps for Xeon, their AI portfolio, and their dataflow accelerator look like filling out, so we may even see the top to all this secrecy from Intel in the subsequent few months.

The DOE can also be funding two or three different exascale supercomputers that may comply with the A21 deployment. That is being executed underneath the contract often known as CORAL-2 (Collaboration of Oak Ridge, Argonne, and Lawrence Livermore Labs). CORAL-2 will fund the primary exascale computer systems for Oak Ridge Nationwide Laboratory (ORNL) and Lawrence Livermore Nationwide Laboratory (LLNL) plus a possible third system to be deployed Argonne Nationwide Laboratory (ANL). The Argonne machine might be an improve to the A21 supercomputer or a brand new set up altogether. ORNL is scheduled to put in its system in the Q3 of 2021, adopted by the LLNL system in Q3 of 2022. If the Argonne choice is exercised that machine may also be deployed in the third quarter of 2022.

Not one of the distributors for these two or three methods are recognized, however the most probably candidates are those with pre-exascale methods in the sector, that’s, IBM, NVIDIA, and Mellanox; or the A21 distributors, Cray and Intel. Then again, HPE and AMD have a great shot at upsetting the incumbents, both because the system vendor in the case of HPE, or as a CPU or GPU supplier in the case of AMD. Marvell is probably an extended shot with a third-generation ThunderX Arm CPU. Though all of the CORAL-2 techniques are three or 4 years away, the awards ought to be introduced in the primary half of 2019.

Consequently, earlier than the top of the yr, we should always have a reasonably good image of the early exascale panorama around the globe – which distributors shall be constructing the preliminary techniques and what processors, accelerators, and interconnects can be used to energy them. And perhaps even who will attain the milestone first.