To keep up with all the demands of a high-performance computing (HPC) data center, Purdue, like most research universities, must respond to a constant and ever-increasing demand for higher computing power. As a result of its long and continuing expansion, the Rosen Center for Advanced Computing was planning the installation of a new cluster, but the facility had reached its maximum cooling capacity and density.
Purdue’s Rosen Center for Advanced Computing was of traditional design with a cold-air plenum in the ceiling fed by computer room air conditioning (CRAC) units around the periphery of the room. The air system could not cool full racks nor was there space in the data center to add more racks. Even with racks that were only half full, density was maxed out.
So Purdue began by looking at additional space. The cost for a brick-and-mortar expansion varied from $2 to $3 million for renovation of available space, to a $50 million project to build a new data center that could consolidate other university data centers. Whichever solution they chose, the price tag was going to be in the millions of dollars, which made it essential to find a way to maximize the existing data center space.
Michael Marsh, senior computer engineer at Purdue, was one of the team charged with finding a better solution, one that Purdue could retrofit to their existing equipment and physical infrastructure, a solution that would allow more densification in their existing 4,000 square foot (sq ft) space. Marsh looked at options that were just becoming available—hot-aisle/cold-aisle containment, various versions of fan doors, water-cooled doors with fans, and passive water-cooled doors without fans. The evaluation criteria Marsh employed took included water quality, door-frame compatibility, lifecycle operating costs, redundancy, reliability, the opportunity to expand incrementally over time without incurring large capital expense. And the number one priority—remaining operational.
After evaluating available options, Purdue turned to a liquid-cooling solution provided by Coolcentric. The Coolcentric solution involves the installation of a rear-door heat exchanger (RDHx) to the back of a rack enclosure. Chilled water is fed to the specially designed coil inside the RDHx. The Coolcentric RDHx is engineered to use existing airflow from fans built into the rack-mount devices. Hot exhaust air moves through the coil transferring heat to the water, and cool neutralized airflows back into the data center without the need for any additional fans. Chilled water is controlled by and fed to the heat exchangers by coolant distribution units (CDUs). Each CDU can effectively remove sensible heat from several 42U racks. The Purdue data center installation consisted of six CDUs cooling a total of 52 racks. Prior to the Coolcentric installation, the data center had eight CRAC units, which were barely handling the load. When they put in the Coolcentric equipment, they were able to remove four of the CRAC units, and the remaining four units provide supplementary cooling to support equipment lacking a rear door heat exchanger and maintain airflow through the facility.
EVALUATING THE CRITERIA
Purdue’s campus infrastructure includes a central chilled water plant that provides chilled water to more than 100 buildings. Like many universities, Purdue has an aging physical plant—some of the piping has been underground for 100 years—making the quality of the chilled water a significant concern. Testing revealed that the particulate level in the water was unacceptably high for those solutions that circulated water from the chiller directly to the cooling devices. Because the Coolcentric design keeps the chilled water isolated from the water circulated to the doors, it was the only water-cooled solution that could accommodate the chilled-water quality.
Purdue seeks competitive bids for new equipment and racks potentially every year. As a result, it is faced with a constant change of equipment vendors. Some manufacturers’ racks are not compatible with other vendors’ equipment and vice versa. Coolcentric rear door heat exchangers have transition frames for attachment to all major rack manufacturers, allowing Purdue great flexibility with future enclosure purchases.
In addition to significant energy savings from eliminating four CRAC units, Purdue identified further operating cost savings from the passive Coolcentric solution compared to expanding its existing fan door cluster. It found that each fan door consumed 1 kilowatt (kW) per rack—1 kW of power consumption per rack for the electric fans adding 1kW of heat to the room—as opposed to the RDHx, which does not need fans and has no power consumption. The CDU requires 2.5 kW but services 10 racks. A ten-year analysis showed energy cost savings of $175,000 using the Coolcentric CDU/RDHx solution compared to expanding the existing fan door strategy to 50 racks.
Dealing with redundancy was a question mark for Purdue. Various hot-aisle/cold-aisle containment designs provide a common strategy; adding one more cooling unit to a rack lineup yields N+1 redundancy if one fails/ With the Coolcentric design, N+1 redundancy was a challenge. As an alternative, Purdue discovered a very viable strategy in the process of dealing with a simulated failure—interleaving racks and cooling units.
A conventional rack/CDU layout has one CDU serving 6 to 10 racks, typically lined up in a row side by side. Purdue engineered a better method by interleaving three of its CDUs so that adjoining racks were fed by different CDUs, meaning that a failure in one cooling unit would leave only one third of the racks in a cluster without cooling, and the neighboring racks could pick up the additional load. A study conducted by Purdue found that cooling could be maintained in the event of a CDU failure by delivering cooler water from non-failing CDUs, or with a small amount of supplemental room cooling.
More recently, Marsh further enhanced Purdue’s data center availability by enabling N+1 redundancy with an interconnected CDU/manifold implementation and a “hot spare.” According to Marsh, “With the piping arrangement shown, all the manifolds can share the water from the loop, and individual manifolds or CDUs can be isolated from the loop as well for maintenance or repairs. The racks are still interleaved in this arrangement, so if I shut off a manifold for some reason (say, to add a rack or make a repair), it only impacts every third rack as in our previous cooling system layout. Also, sharing the water among a larger pool of CDUs and compute nodes tends to even out the large heat load variations seen with the new class of processors as their computing load changes from moment to moment.”
Purdue has had its share of problems with regular CRAC units. Considering maintenance on compressors and condensation issues, they wanted to avoid continuing down that road if they could. The passive Coolcentric solution is basically a tube-and-fin heat exchange unit. Water circulates through a cooling module that does not involve a compressor, eliminating compressor-related mechanical failures, condensation, and high energy consumption. From an engineering standpoint, Purdue likes the reliability of the cooling doors. There are no moving parts in a cooling door as opposed to fan door units, which have moving parts that are a common failure point.
When Marsh was asked about concerns regarding water in the data center, he pointed out that water cooling in data centers has been field-proven going back decades with mainframes from IBM, Unisys, Control Data, and others. Marsh also said that if you have CRAC units, you already have water in the data center.
A bigger concern is condensation. Air conditioners will condense water into drip pans that frequently clog up and, if nobody notices, overflow putting water under the floor to mix with power and data cables. One of the welcome features of the Coolcentric system is that it monitors the dew point and prevents condensation from occurring, a big plus for Purdue.
The Coolcentric solution lets Purdue deal with increasing density, and accommodate additional racks and clusters. It also gives it the ability to rearrange its physical space for particular needs, such as networking topology as new high-speed technologies evolve.
John Campbell, associate vice president, Rosen Center for Advance Computing at Purdue, said, “It allows us to grow in a smaller way, which for me, is a lot kinder on the budget. It’s a lot easier if a project comes up where I need a couple more racks, to get a couple more doors rather than having to do a major implementation.”
Noise reduction in the center was an unanticipated and welcome result of the passive door solution. Purdue’s previous compute cluster, the “Steele" cluster as it was known, used fan doors for cooling. The noise generated by the fan-cooling doors was at the limit of OSHA regulations. (85 decibels) When a rear door on one of the racks was opened, the level jumped up to 100 decibels, well above OSHA regulations. The Coolcentric passive rear door heat exchangers have no moving parts and contribute no noise to the data center.
As Marsh said, “There’s no way we could cool our data center reliably with the power density we have now with traditional CRAC units. Coolcentric looked like the best bet to be able to handle that, and it did.”