The Way Business Is Moving published by
Issue Date: June 2007

Sun gets serious, finally, about supercomputing

30 June 2007
Timothy Prickett Morgan

When Sun's top brass brought Andy Bechtolsheim, its former chief technology officer and the first employee hired by the company's founders, back to the company in February 2004, they got a lot more than a techie who knows about chips, servers, and operating systems. They also got one of the smartest people in the world when it comes to networking technology, and someone who was probably going to give the big server makers a run for the money in the media and high performance computing space with his little company, Kealia. This week at the International Supercomputing Conference 2007 show in Dresden, Germany, we got a peek at exactly what Bechtolsheim was working on at Kealia.
Between the time Sun was founded back in the early 1980s and when Bechtolsheim came back to Sun three years ago, Sun acquired a large amount of supercomputing experience. It bought the carcasses of massively parallel supercomputer makers Thinking Machines and Kendall Square after they went bust as well as the Cray 6400 server line from Silicon Graphics - which may just go down as the best acquisition in the server space, given the popularity of the rebranded 'Starfire' Enterprise 10000 servers that fueled the dot-com boom.
It is important to remember that Sun had planned for dual-core UltraSparc-V servers to be available in servers with more than 1000 processors in a single system image and delivering about 6 teraflops of number-crunching power - back in 2002. This did not happen for a lot of reasons - chip delays, the dot-com bust, and the rise of X64 servers running Linux that offered much better bang for the buck than Sparc boxes ever could.
In the past several years Sun has embraced X64 processors and re-ported its Solaris operating system to this processor architecture, and it was well on its way down this path when Bechtolsheim was brought back in. But what is clear from the 'Galaxy' server designs and now the 'Constellation' blade server and related parallel supercomputer designs is that, for whatever reason, Sun is back in the game in the server racket in general and is determined to not be at the bottom of the Top 500 list of supercomputers any more.
Because Bechtolsheim understands, of course, that the network is the computer and that designing the interconnect in any cluster is more important than the elements of the compute nodes themselves.
At ISC today, Sun took the wraps off of the Constellation System, which brings various pieces of the Galaxy server line as well as the new 'Niagara' Sparc T1-based blade servers together with a high-speed, massive InfiniBand switch that creates a giant supercomputer cluster, one that utterly dwarfs whatever Sun was planning with its UltraSparc designs of years gone by with their 'WildFire' interconnect.
While the Sparc designs from a decade ago - which Bechtolsheim did not really participate in - had giant server nodes and a very fast interconnect to lash the machines together, the Constellation System goes the direction that the HPC market has gone, which is toward commodity rack or blade servers and lots of connectivity between server nodes. But the Constellation System that Sun announced today has a few interesting tweaks that make it different from normal X64-InfiniBand clusters.
The InfiniBand switch, code-named 'Magnum', that is literally at the heart of the Constellation System has 3456 double data rate (DDR) InfiniBand ports. Bechtolsheim, looking at the way people connect servers and storage together for media and HPC applications, took a simple approach with the X4500 'Thumper' data servers, putting 48 SATA ports on a motherboard and turning a two-socket Opteron server into a massive, dense data server.
Similarly, he looked at the clustering of InfiniBand core and leaf switches, which are necessary to lash servers together with InfiniBand these days, and though that the best thing to do was to get rid of all of these layers of switches. Servers plug right into the Magnum switch, and there is no hierarchy of InfiniBand gear to buy. (Which may not make Bechtolsheim's prior employer, Cisco Systems, very happy.)
To do what the Magnum switch does would take 12 core InfiniBand switches and 288 leaf switches, and by moving to this simplified arrangement, Sun can cut down the number of cables in the cluster by a factor of six and cram a 3456-node cluster into 20% less space. The Magnum is a box that is twice as wide and half as tall as a standard rack, and it has a bisection bandwidth of 110 Tbps. Sun is using a 12X InfiniBand cable coming out of the Magnum switch, which splits down to four 4x InfiniBand links as the wire gets closer to the server nodes.
The server nodes in the Constellation System are, of course, the new Constellation class blade servers, which plug into the Sun Blade 6000 chassis and which use dual-core Opteron, dual- or quad-core Xeon, or Sparc T1 processors. (The latter is not much good at number-crunching.)
With quad-core 'Clovertown' Xeon chips, Sun can deliver 6 teraflops of computing (768 cores) per chassis and that works out to 24 teraflops per rack. The way Sun is pitching the Constellation System, the nodes run Solaris, but obviously the X64 nodes can run Windows or Linux should customers opt for that. The HPC Cluster Tools and Studio 12 compilers are tweaked for Solaris and Linux, and Sun's Grid Engine grid computing middleware is also in the Constellation System if customers want it. Other workload and cluster management systems, such as Rocks and Ganglia, are also supported.
As for storage, the Constellation System uses X4500 storage servers, and using 1 TB disks (which are just becoming available), Sun can cram 1 petabyte of storage into two racks. These storage servers hook into the same InfiniBand switching structure as the server nodes, which was, after all, the whole point of InfiniBand. The storage servers run Sun's Solaris 10 Zettabyte File System, which has a fault tolerant data protection algorithm Sun calls RAID Z, and layers the open source Lustre object file system on top of that.
When you add it all up, Sun can today deliver a 1,7 petaflops supercomputer with up to 10 petabytes of disk capacity. Such a configuration would include four of the Magnum switches daisy-chained together and would have 13 824 blade server nodes and over 110 592 processor cores. There is absolutely nothing embarrassing about such scalability, and the real question is can Sun deliver this at a competitive price.
Last October, Sun announced that the Texas Advanced Computing Center (TACC) at the University of Texas at Austin had commissioned Sun to build a Solaris supercomputer rated at 400 teraflops using Galaxy servers. As it turns out, TACC is actually buying a Constellation System, and nicknaming it 'Ranger'. Since last October's announcement, the Ranger cluster machine has been upgraded to over 500 teraflops.
TACC is waiting, like many customers, for Advanced Micro Devices to deliver the quad-core 'Barcelona' Opteron processors for its machine. TACC plans to use 15 700 of these processors, 125 terabytes of main memory, and 72 of the Thumper arrays, which will have a total of 1,7 petabytes of disk capacity. Ranger is being built through a $59m grant from the National Science Foundation. About $30m of that is going to the university for hardware acquisition (it is unclear what Sun's take is) and the remaining $29m is for ongoing support costs for Ranger, which is expected to be operational on 1 December.

Others who read this also read these articles

Others who read this also read these regulars

Search Site


Previous Issues