The Texas Advanced Computing Center (TACC) was busier than ever. As the designer and operator of some of the world's most powerful computing resources, the center had several thousand projects in the queue and needed to upgrade its already powerful Lonestar5 supercomputer to keep up.
But, in order to make the massive scientific leaps for which TACC was famous — in fields like quantum mechanics, astrophysics, photovoltaics, and biological research — major obstacles stood in the way: available space, power, and budget.
With the launch of its new Lonestar6 supercomputer, TACC overcame each of those obstacles, reduced backlog, and accomplished project goals with the help of GRC, Dell Technologies OEM Solutions, and AMD.
TACC
TACC’s unassuming name belies the groundbreaking work for which it is widely known. Located at The University of Texas at Austin, its mission is to enable discoveries that benefit science and society through the application of advanced computing technologies. For this, the center receives funding from the National Science Foundation (NSF), along with other important research and education institutions.
Since its inception in 2001, TACC has evolved its capabilities while making the best use of existing hardware investments whenever possible. As a result, the center has employed a variety of cooling strategies that include CRAC and chiller, in-row, liquid-to-chip, and single-phase liquid immersion cooling.
Immersion cooling, in particular, has allowed TACC to continue achieving key scientific advancements by pushing the limits of computing power. Thus, it comes as no surprise that the center is now the home of the world’s longest-running immersion cooling system. This system was designed by GRC.
Faced with increasing power demands, the advent of advanced server and processor technologies, and higher operating temperatures, TACC anticipated the limits of air cooling early on. Largely out of necessity, the TACC team soon discovered liquid immersion cooling’s potential to address these challenges.
Starting with a single-rack installation in 2009, TACC has continued stretching the boundaries of supercomputing using GRC’s innovative liquid immersion cooling solutions. The center has since quadrupled the deployments of ICEraQ-cooled supercomputing systems, leading up to Lonestar6.
TACC’s single-phase immersion cooling systems progression
ICEraQ Prototype
- Commissioned in 2009.
- Proof-of-concept for immersion cooling ability and reliability.
- Installed on a loading dock.
- Demonstrated location flexibility of single-phase cooling.
- No chilled water; used evaporative cooling tower.
Maverick2
- Deployed in 2012.
- Supports GPU-accelerated machine learning and deep learning research.
- Proof-of-concept for the future Frontera supercomputer.
- Provides 30 kW/rack density.
- Features 23 nodes, each with 4 NVidia GTX 1080 Ti GPUs running in a Broadwell-based compute node.
- Features four nodes, each with two of NVidia V100s GPUs running in a Skylake-based, Dell PowerEdge R740-based node.
- Features three nodes, each with two NVidia P100s GPUs running in a Skylake-based Dell PowerEdge R740 node.
- Up to 10/40/56Gbps bandwidth and a sub-microsecond low latency.
Frontera
- Deployed in 2019.
- Single-precision compute cluster to run AI, machine learning, and molecular dynamics applications, accelerating new scientific discoveries.
- Hybrid liquid immersion-cooled GPU subsystem and liquid-to-chip.
- Provides 60 kW/rack density
- Intel Xeon CPU E5-2620 v4 at 2.10GHz.
- Has 16 cores per socket, 32 per node.
- 360 NVIDIA Quadro RTX 5000 GPUs; 4 GPUs per node.
- Became the most powerful Peta-scale supercomputer at any U.S. university
Lonestar6
- Deployed in 2021.
- Can perform approximately 3 quadrillion mathematical ops/second.
- Operates within a discrete high-density zone.
- Provides 70 kW/rack density.
- Hybrid immersion and air-cooled system:
- 336 immersion-cooled, compute nodes.
- 80 air-cooled GPU nodes.
- 200 air-cooled 1U servers in 10 racks.
- Has 84 Dell PowerEdge C6525 servers.
- Features 2x AMD EPYC 7763 64-Core CPUs ("Milan"), 2.45 GHz (boost up to 3.5 GHz).
- Features 128 cores on two sockets (64 cores/socket).
Lonestar6
The newest in TACC's Lonestar series of high-performance computing systems, Lonestar6 was deployed specifically to support Texas researchers. Clocking in at an amazing three petaflops, it’s three times as powerful as its predecessor, and one of the fastest supercomputers at a U.S. university.
Of course, where data centers are concerned, with increased performance comes greater heat production. Working closely with TACC, along with partners Dell Technologies OEM Solutions and AMD, GRC has evolved its single-phase immersion cooling systems to overcome the heat dilemma.
Because of GRC’s proven performance, TACC chose to cool Lonestar6 with the ICEraQ Series 10 Quad.
Persistent TACC Challenges
While housing a succession of acclaimed supercomputing systems, TACC has never been immune from the many challenges less celebrated centers face every day. Perhaps the biggest is discovering that air cooling is simply incapable of handling the kind of GPU-heavy compute loads that are in growing demand today — most notably high-performance computing (HPC), AI, and AR/VR applications.
Nor, despite its notoriety, is TACC exempt from dealing with issues, like finite space or limited funding. In the latter case, the ICEraQ Series 10 Quad helped to optimize GPU processing with the allotted grant monies.
TACC has faced other distinctive challenges as well, starting with triple-digit weather that is common to its region.
Complicating matters even more, “TACC is very unique in that they have a number of different cooling technologies, and a number of different computing evolutions, all in one facility,” said Brandon Moore, GRC’s senior solutions architect.
GRC’s liquid immersion cooling solution addressed all these concerns. Thus, for TACC, immersion cooling soon emerged as the only practical way forward.
Deciding factors
GRC’s ICEraQ Series 10 Quad enabled TACC to triple its raw computing power within the same space and power envelope. That alone stood as an overriding reason to choose GRC. But, other factors influenced the decision as well.
“We had the budget to install 600 nodes,” said Tommy Minyard, TACC’s director of advanced computing systems. “But we didn’t have the corresponding cooling capacity for it. We evaluated several different vendors and cooling technologies, and cost was a huge consideration.”
The main deciding factors were sheer cooling performance, reduced CapEx and OpEx, sustainability, minimal changes to infrastructure, location flexibility, reliability and safety, partnerships with technology providers, and results of previous GRC system deployments.
For both environmental and cost considerations, TACC also wanted to minimize the extent of modifications to the data center, which was another reason the team opted for the GRC solution. That, and GRC’s historical willingness to partner with TACC and Dell Technologies to problem-solve and create ideal solutions.
Results
Thanks in no small part to single-phase immersion cooling, TACC’s Lonestar6 supercomputer delivers three times the performance than its predecessor — with less space, power, and expense. That level of productivity has proven itself critical to TACC’s ability to continue extending the boundaries of scientific discoveries.
“When running parallel simulations, you need to squeeze every bit of performance out of these computers,” said Dan Stanzione, associate vice president for research at The University of Texas at Austin, and TACC’s executive director. “The only other option would be to run air cooling at hurricane speed, or else slow the chips down.”
The latter was not really feasible, considering that doing so would add to project lead times, reduce the number of projects that can be completed, increase costs due to longer runtimes, and be a very inefficient use of Lonestar6’s processors.
GRC’s ICEraQ Series 10 Quad provided a host of benefits to TACC’s operation, including a huge jump in performance, running high-power chips at a very high density, cooling 70 kW/rack (with more to spare), doubling the number of servers in same power envelope, accommodating more nodes per rack, and maximizing a hybrid cooling environment.
“GRC’s immersion cooling solution has given us the ability to use the densest servers from Dell Technologies and hottest chips from AMD,” Minyard said. “These chassis have 280-W CPUs that run so hot they cannot be cooled by air.”
For reasons of reliability, sustainability, servicing, flexibility and sheer cooling power, TACC has been very pleased with GRC’s ICEraQ Series 10 Quad system for Lonestar6.
TACC continues working closely with GRC, developing new strategies to reliably cool ever-increasing power density needs.
Not surprisingly, TACC’s continued track record of performance has translated into more funding. “Support from The University of Texas Research Cyberinfrastructure Initiative has made a difference,” said Dan Stanzione of TACC. “It has allowed Texas researchers to leapfrog the competition, providing a competitive advantage to scholars in the UT System, at Texas A&M, Texas Tech, and now The University of North Texas.”