For the last five years, data center owners and operators across the globe have been very focused on improving the energy efficiency of our data center power and cooling systems and reducing PUE. As a result of that focus, just about every realistic power and cooling efficiency solution has become commonplace in the design of our newer facilities. Aisle containment, elevated supply air temperatures, outside air economizers, VFDs, and energy-efficient transformers and UPS systems are nearly always specified in our designs. PUEs of 1.3 and below are now consistently achieved, even for highly redundant facilities in the most hot and humid regions of the country, and a 1.15 is often our target in cooler and dryer climates.
The 7-MW XT Jaguar supercomputer at Oak Ridge National Labs.
But, the most efficient of all IT facilities are the cloud-computing data centers (the cloud) and the high-performance computing centers (HPCs) that are now achieving PUEs as low as 1.05. Those levels of efficiency can result in tremendous savings in a large facility and really deserve a close look. The low PUEs also demonstrate how well we can control our power and cooling energy costs these days and how little opportunity remains to further reduce our PUEs and become more efficient.
In the last “Zinc Whiskers” (Sept./Oct. 2011), I identified four different types of IT facilities and took a close look at the differences between traditional data centers with their highly redundant back up electrical power systems and low power densities and HPC centers with their IT failover strategies. The HPC failover strategy requires very little backup power and provides low-latency computing that results in extremely high power densities, often measured in the thousands of watts per square foot. Differences in the way the two types of centers are operated are very apparent.
However, the newest data centers, built for the cloud, are strikingly different than legacy data centers. In fact, they are becoming more and more like the HPC centers that have operated for decades in university and federal government research and development facilities.
Let’s look at how cloud data centers and HPCs are becoming more similar and why their PUEs are so low. I’d also like to look at how HPCs are changing and how cloud facilities can benefit from HPC operating experience and R&D. And, while we are at it, let’s explore how we might look beyond low PUEs and find opportunities to become more energy efficient in our data center designs for the future.
SIMILARITIES BETWEEN SUPERCOMPUTING AND THE CLOUD
As recently as five years ago, aisle enclosures and 80ºF server supply air temperatures were just a figment of the imaginations of a few engineers looking for ways to reduce costs. So, for the sake of planning for more efficient operations tomorrow, it isn’t too early to explore the possibilities of things that HPC centers are beginning to do today.
The similarities between cloud computing and supercomputing environments are becoming more evident as we build out our newest and most efficient data center spaces. After all, both cloud and HPC maximize computing performance and efficiencies by continuously operating processors at close to maximum speeds, creating very high compute densities and power densities alike. This means that a lot more power is directed into smaller spaces, creating heat loads so high that it becomes impossible to cool the space with air alone.
The most efficient HPC centers use an optimized combination of air and water cooling at the rack to remove all that heat. The newest of our cloud environments deploy 10 and 15-kilowatt (kW) racks that equate to power densities of as much as 600 watts per square foot. Cloud data center operators are preparing designs to go to even higher densities by using air and water cooling systems together and using outside air systems operating in conjunction with in-row coolers or rear-door heat exchangers to cool the same space.
And, for different reasons, both cloud and HPC computing environments depend less on the redundancy and reliability systems than do traditional data centers. The cloud provides inherent reliability within the processing environment that allows for fail over from one computer to another, and now, with the right network architecture, from one facility to another. Supercomputers, on the other hand, are usually used for high-volume computational and problem-solving purposes that can stop and start without jeopardizing operations. So with the right fail-over strategies, supercomputers can come down “softly” by saving data to backed up data storage devices and restarting their computing when power becomes available again. Backup electrical power, in the form of UPS systems, is less important in both environments. These are two good reasons as to why their PUEs are so low.
Last July, Facebook hosted a Critical Facilities Round Table meeting at its Palo Alto, CA, headquarters to present the design of its Prineville cloud data center (see www.cfroundtable.org/membership-meetings.html). Facebook presented 400-V electrical distribution centers, 100-percent free-cooling HVAC, and variable-speed server fan controls that taken altogether represent a real breakthrough in data center design.
As much as we all applaud these advances in data centers, it is interesting to note that many of these features have been deployed in supercomputing spaces for years. The high-voltage electrical systems and the “combined air-and-water” cooling solutions that are only now finding their way into our most aggressive data centers designs have been deployed in HPC environments for decades.
RECENT ADVANCES IN SUPERCOMPUTING EFFICIENCIES
PUEs are now so low it is becoming evident that the next advances in data center energy efficiencies will have to come from somewhere other than simple improvements in current methods of power and cooling, probably from the processing technologies themselves. And, in fact, the newest supercomputers are achieving higher and higher compute densities that actually require less power, space, and cooling to accomplish the same work performed by older computers.
The most efficient HPCs today utilize a computing strategy that integrates the operation of multiple processors types into the same computer. The combination of central processing units (CPUs) and several graphics processing units (GPUs) has proven to be much more efficient than adding more CPUs to the same computer. So instead of developing CPUs with many more cores in the same processor, we are now combining the capabilities of processors with different characteristics to perform the same amount of work more efficiently. Using this strategy over the last few years, supercomputers have increased their computing capacity by a factor of ten while using the same amount of energy required by older computers (see http://en.wikipedia.org/wiki/supercomputer).
Similar advances in server efficiencies are already making their way into our cloud environments. Seagate Technologies, for example, offers a server that utilizes a strategy founded upon the HPC model of combining processor types to achieve superior performance. Seagate accomplishes this by combining CPUs with a multitude of smaller processors much like those found in your cell phone. Their SM1000 server is said to require only 25 percent of the power space and cooling of the servers they replace, while achieving compute densities four times higher than previous servers and without increasing the power density at all. Seagate received an award and grant from the Department of Energy in 2009 for developing energy efficient computing technologies like these that were discussed at length in a previous “Zinc Whiskers” column (see Mission Critical, July/August 2010 p. 20).
HPC ADVANCES WILL CHANGE POWER AND COOLING STRATEGIES
Advances in technology will to continue to drive changes in our facilities infrastructures. Recent research involving superior materials, cooling strategies, and testing methods are leading us to develop computers that are far more efficient than those we operate today, and they are fundamentally different in several ways. These changes will require a new approach to the way we operate the data centers that house them and very different methods of providing the power, space, and cooling that support them. That will be the subject of our next issue of “Zinc Whiskers” where we will go into detail about these changes and what they mean to our facilities.
CFRT hosted a panel at the Technology Convergence Conference at the Santa Clara Convention Center on February 2nd, during which panelists addressed the issues presented in this article (see www.teladatatcc.com). CFRT is also planning to visit a nearby high-performance computing facilities to see newly installed 100-kW racks in operation.
CFRT is a non-profit organization based in the Silicon Valley that is dedicated to the open sharing of information and solutions amongst our members made up of critical facilities owners and operators. Please visit our Web site at www.cfroundtable.org or contact us at 415-748-0515 for more information.