It’s onerous to say for positive, however there’s in all probability as a lot combination computing capability within the educational supercomputing facilities of the world as there are within the massive nationwide labs. It’s simply unfold out throughout extra establishments, and additionally it is supporting a wider array of workloads throughout a bigger variety of domains. This presents its personal challenges, and is analogous to the distinction between what occurs at giant enterprises in comparison with hyperscalers.
The College of Michigan is the most important public analysis establishment in america, with a shocking $1.48 billion in analysis funded throughout all domains of science in 2017, and it’s second solely to Johns Hopkins College, which shells out about $1 billion extra on analysis. (You possibly can see development knowledge on analysis funding on this Nationwide Science Basis research.) Driving this analysis takes a specific amount of supercomputing horsepower, and up till now, the primary cluster at Michigan, referred to as “Flux,” has grown organically over the course of eight years and has a mixture of totally different sorts of compute, Brock Palen, director of the Superior Analysis Computing – Know-how Providers division of the college, tells The Subsequent Platform. However beginning subsequent yr, with the set up of the “Great Lakes” supercomputer, the machine will all be based mostly on a single era of compute, specifically the “Skylake” Xeon SP processors from Intel, with just a little assist right here and there from some Tesla GPU accelerators.
The ultimate configuration of the Great Lakes machine has not been set but, however apparently, regardless that the brand new system could have solely roughly 15,000 cores when it’s accomplished, it’s anticipated to have considerably extra efficiency than the hodge-podge of equipment within the present Flux cluster. Brock has not executed a calculation to determine the mixture peak petaflops of the Great Lakes machine but, so we’ve to guess.
If the Great Lakes system makes use of center bin Xeon SP elements and affordable core counts (say 16 or 18 per chop) and frequencies, it’s going to in all probability are available at between 1 petaflops and 1.5 petaflops. We doubt very a lot that Michigan is shelling out the dough for prime bin 28-core elements that value $13,011 a pop. The processors alone at record worth would value near $7 million, and even with a 25 % low cost, the Xeon SP-8180M Platinum processors can be $5.2 million – costlier than the $four.eight million that Michigan is spending on the PowerEdge servers within the cluster from Dell, the InfiniBand networking and InfiniBand-to-Ethernet gateways from Mellanox Applied sciences, and the flash-enhanced storage from DataDirect Networks.
The ensuing comparatively homogenous substrate and the brand new community topology will permit ARC-TS to raised scale bigger jobs on the system, even when the typical job on the machine presently runs on a single node or much less of capability, in accordance with Palen. Up till now, Michigan researchers have been inspired to run giant jobs on the XSEDE community of supercomputers which might be funded by the Nationwide Science Basis. The college has over 2,500 lively customers pushing the computing wants of over 300 analysis tasks by means of Flux and the opposite methods in its datacenter. (Extra on these in a second.)
The Flux machine is actually numerous, which is counterbalanced considerably by the truth that the Message Passing Interface (MPI) parallel processing protocol might be outfitted with one rank per core and stability all of it out over the cluster considerably. The combination of servers on the usual Flux a part of the system consists of 5,904 “Nehalem” Xeon cores from 2009; 1,984 “Sandy Bridge” Xeon cores from 2012; 2,520 “Ivy Bridge” Xeon cores from 2013; and three,912 “Haswell” Xeon cores from 2014. In the event you do the maths on the Flux configuration proven right here on the ARC-TS website, you get 14,936 cores towards 927 server nodes. The Nehalem and Sandy Bridge Xeon nodes are based mostly on Dell’s PowerEdge C6100 hyperscale-inspired servers, which cram 4 two-socket server nodes right into a 2U chassis. The Ivy Bridge Xeon nodes are based mostly on IBM/Lenovo’s NextScale M4 nodes within the n1200 chassis, which additionally pack 4 two-socket nodes right into a 2U area. It’s unclear who manufactured the Haswell Xeon nodes, however the far reminiscence nodes are a mixture of Dell PowerEdge R910 and PowerEdge R820 machines and IBM/Lenovo X3850 servers.
This can be a fairly sizable setup, thoughts you, nevertheless it ain’t 27,000 cores as some parts of the ARC-TS website says. (That could be the mixture variety of cores throughout all HPC techniques.) Regardless of. What appears clear is that by shifting to Skylake Xeons with even 15,000 cores, Michigan goes to get much more computing oomph. Our tough guess is that Flux is round 500 teraflops at double precision and that Great Lakes could have about 3X occasions the efficiency. None of this was made clear within the announcement by Dell and Michigan, however the efficiency bounce issues. When the configuration settles out, we will do some math on the worth/efficiency, which is why we hassle.
All the Flux nodes are linked to one another utilizing Voltaire GridDirector 4700 switches, which predate the acquisition of Voltaire by Mellanox Applied sciences in 2011. This are 40 Gb/sec QDR InfiniBand director switches, that are based mostly on Mellanox InfiniBand change ASICs. The Flux setup additionally has an Ethernet community for accessing storage.
Based on Palen, an enormous chunk of the Flux setup was owned and funded by totally different school members and their tasks, and ARC-TS simply managed it for them. However looking forward to the Great Lakes system, Michigan needs to have the ability to drive up the utilization throughout the cluster by doing extra timesharing throughout the cluster, and in addition permitting some jobs to seize extra computing than was attainable when Flux was all partitioned by school and analysis. It means altering it from a set of clusters doing their very own factor to a real shared compute utility.
Palen says that the distributors who pitched machines for the Great Lakes cluster put collectively bids for each Intel Skylake and AMD Epyc processors, and that the competitors was shut between the 2 architectures on the suite of benchmarks that Michigan makes use of to purchase machines. Finally, the Skylake machines gained out. However don’t assume for a second that Michigan is someway a Xeon store solely. The college has a slew of various techniques, together with a 75-node IBM Power8-Nvidia Tesla CPU-GPU hybrid referred to as Conflux, which is exploring the intersection between HPC and machine studying, all linked by a 100 Gb/sec InfiniBand material from Mellanox. The college additionally has a cluster with four,600 cores based mostly on Cavium’s ThunderX processors that has three PB of disk capability and that’s set as much as run the Hadoop knowledge analytics platform.
The Great Lakes machine will probably be comprised of three several types of nodes, identical to the Flux machine that preceded it, together with numerous normal nodes plus some giant reminiscence machines and a few GPU-accelerated machines. The primary compute shall be based mostly on the Dell PowerEdge C6420 machines, which put 4 two-socket nodes right into a 2U enclosure as properly, with hugh reminiscence nodes being based mostly on the PowerEdge R640 and GPU nodes based mostly on the PowerEdge R740.
For storage, Michigan is choosing DDN’s GridScaler 14KX arrays and their Infinite Reminiscence Engine (IME) cache buffer, which can weigh in at 100 TB. The college is shifting away from Lustre to IBM’s Spectrum Scale (previously GPFS) parallel file system, and Palen defined that the college was not thinking about doing a separate burst buffer sitting between the parallel file system and the cluster. ARC-TS is utilizing InfiniBand-to-Ethernet gateways from Mellanox to hyperlink the Great Lakes cluster to the GPFS clustered storage and to different storage out there on the college, beginning with 160 Gb/sec pipes and increasing to 320 Gb/sec pipes sooner or later within the close to future.
“We support a large variety of workloads, and this is what drove us to an IME buffered memory setup,” says Palen. “Even if individual applications are doing nice with I/O, if you have hundreds of workloads trying to use the scratch storage, then scratch will essentially see a random load and thus performs horribly compared to its peak. The goal here was to have 5 percent of the scatch disk capacity available in flash buffers in an intelligent way so that hot data and random writes are absorbed by the flash for current activity. Burst buffers that you have to consciously activate the flash and then transfer it back to primary storage, that was not something we are interested in it. Big labs have people only running a handful of workloads, but I have thousands and I need to train them and they come and go every three or four years. So we are looking for a technical solution rather than a perfect one.”
As for the community used to hyperlink the compute nodes on the Great Lakes cluster, it’s true that Michigan goes to be the primary educational analysis establishment to put in 200 Gb/sec HDR InfiniBand, utilizing Quantum switches and ConnectX-6 adapters from Mellanox. However right here is the twist on that. As an alternative of on the lookout for very excessive bandwidth per port, Michigan was trying to reduce down on the price and the oversubscription of the community, and so it’s utilizing cable splitters on the 40-port Quantum switches to show them right into a higher-radix digital 80-port change, permitting for one change to have downlinks to all 80 servers in a rack. Prior InfiniBand switches had 32 ports per change, so that you wanted three of them to cowl a rack, and also you had 16 stranded ports, which is wasteful. The fats tree community shall be arrange so Michigan can add 50 % to the server node rely with out having to rewire the bodily InfiniBand community.
The Great Lakes machine shall be put in through the first half of 2019. About 70 % of the customers on the system are doing hardcore engineering – combustion, supplies, and fluid dynamics simulations; the GPU-accelerated workloads are typically molecular dynamics.