Data Center Air Cooling Solutions: Yesterday, Today, and Tomorrow
Historically, our data centers have been operated as open-air environments where supply and exhaust air were free to mix as we attempted to create homogeneous temperature and humidity levels throughout the facility, and while we provided enough chilled air to neutralize the maximum heat load emitted from all the servers and other technology in the space. In other words, we dramatically overcooled our spaces and managed to invent something called hot spots along the way.
CONTAIN YOUR AISLES
But, about eight years ago, we all finally realized how terribly wasteful this practice really is. Heating up the supply air that we just paid so much to chill and before it could reach the servers that it was intended to cool just doesn’t make sense, does it? So we have all quickly moved to contain our aisles or to segregate chilled supply air from the hot exhaust air in just about any way we could. And, if we do a good job of sealing the air passages there should be no more hot spots to worry about.
I’m afraid to say that I have seen many data center operators stop at this point, which really bothers me because just getting rid of hot spots doesn’t offer much of a return on investment at all. We have to find a better reason to add curtains or barriers that get into everyone’s way-and saving energy is just that.
The first step to saving energy is raising your supply air temperature. When aisles are contained, all the supply air will be the same temperature. And that gives you the opportunity to chill air to temperatures no cooler than that specified by server manufacturers, often about 75° or 80°F, and that step should allow you to turn down your HVAC plant substantially.
However, if you are blessed with a closed-loop system like CRAC units, the increase in supply air temperatures alone won’t save you any energy at all. Even if CRAC temperatures increase, the change in temperature across the unit (delta T) may remain the same, and the CRACs will use as much energy as they did before. And, I have seen a lot of data centers overlook this issue as well.
Unless you raise the supply air temperature and “free cool” somehow to reduce your energy costs, you haven’t accomplished much of anything. Increasing your operating temperatures always creates the opportunity to free cool for more hours of the year than before, turning economizer improvements that didn’t seem to pay off before into quick returns on investment.
There are several ways that you can free cool with your HVAC systems. Bringing in outside air at much higher temperatures allows us to free cool for many more days of the year. Or, where outside air is not the way to go, you can elevate equipment setpoints in water economizers, chillers, or compressors and reduce or turn off the power required to run our equipment for nearly as many hours every day.
HOW HIGH CAN I GO?
Five years later, we have moved beyond the simple and inexpensive solution of strip curtains hanging from a suspended ceiling framework and moved onto prefabricated aisle enclosures and even solid wall construction between aisles. And, we are elevating our supply air temperatures from 55° or 65°F to the new ASHRAE-recommended levels of as much as 80°F, saving a lot of energy in doing so.
But, when server supply air temperatures reach a certain point, the server fans ramp up to provide additional cooling to the processor and other electronics, and this creates a challenge. First of all, the server fans begin to draw more power as they ramp up in speed. In many cases, the incremental increase of energy consumed by the server fans is greater than the incremental energy saved by turning down HVAC systems. So, you may end using more total energy if you take your temperatures too high.
Supply air temperatures of about 75°F have proven to be low enough to avoid server fan ramping in most cases.
Another interesting phenomenon that you should look out for is the effect that server fan ramping has on PUE readings. As server fan speeds increase, PUE measurements improve, even though more power is being consumed. So, you may be using more total power while believing that your energy performance is improving, when it is not.
This is really a fundamental flaw in the definition of PUE = (total IT and facility power)/ (IT power), especially as it applies to variable-speed server fans, because IT power is measured before it enters the server box. Changes in server fan speeds inversely effect the PUE calculation because server power provides for both processing and for cooling and as the server fans are providing the same function as do the facility’s HVAC systems, and that is to cool the processors.
According to the fan performance laws, the energy consumed by a fan increases at a rate of the fan speed squared and the fan flow rate cubed. So this misleading “PUE Effect” can be more pronounced that you might expect.
FUTURE ASHRAE GUIDELINES
The ASHRAE TC 9.9 Committee has encouraged data center designers to increase operating temperatures in our data centers and to save energy for years. As noted previously, their last publication recommends that we operate our facilities with 80ºF supply air temperatures.
The chairman of the ASHRAE TC 9.9 publications sub-committee, Don Beaty, recently announced that newly released 2011 thermal guidelines will allow us to operate IT facilities at even higher temperatures (see article on page 42). According to Beaty, “The new guidelines were developed with a focus on providing as much information as possible to the data center operator to allow them to operate in the most energy efficient mode and still achieve the reliability necessary as required by their business.”
New “allowable” temperatures are nearly 90ºF for Class 1 data centers with enterprise servers and storage and they range from 95º to as much as 113ºF with volume servers and storage of lower classifications. When asked at a recent conference if the “allowable” ranges were intended for short-term intermittent operations or for continuous operations, Beaty replied that data centers can be operated continuously at these temperatures with some possible consequence.
The guidelines report server failure rates at the higher temperatures so that operators can decide what risks they are willing to take to save energy. What strikes me as a great opportunity is that while operating continuously at 113ºF, volume servers fail at rates only 1.84 times more often than those operating at the 2008 “recommended” temperature of 80ºF. Can you afford to lose less than twice the number of servers you lose now to allow yourself to turn off your HVAC forever? A lot of people will!
This progress is being met with great enthusiasm by the data center community, including representatives by the Green Grid and by R&D divisions of major technology companies like Christian Belady of Microsoft, who has long been a proponent of expanding the operating envelope for HVAC in data centers. The ultimate objective of these advances is clearly stated in the ASHRAE white paper as to support the operation of data centers without any chillers or compressors, and that will save tremendous amounts of energy across the world and improve our environment to boot.
The new ASHRAE white paper is available for download on the ASHRAE website at http://tc99.ashraetcs.org/.
AVOID THIS DISASTER, PLEASE
As promising as these improvements in server performance appear to be, we really need to take a closer look at what they mean to the facilities we house them in. This new environment of contained aisles and higher operating temperatures is sensitive to variations in supply air temperatures, pressures, and flow rates in ways that the old environments were not.
So, the balance between supply-air flow capacity and server fan demand, and not maximum heat load removal, is now the specification that establishes design best practices for HVAC systems capacities, flow rates, and pressures. And, providing enough air to accommodate the total server fan demand is crucial.
Newly manufactured server fans can ramp up to flow rates of two or three times the normal operating rates when hot. The ASHRAE white paper shows them to rise as high as 2.5 times normal flow rates at the highest allowable temperatures. With a data center full of these servers operating at high temperatures, all the fans can ramp up at the same time and overwhelm a conventional HVAC system. An event like this was simulated in an internet data center recently and showed that supply air starvation and uncontrollable overheating may occur as server fans ramp up to maximum speeds trying to cool without enough air to do so.
Data center designers should make sure to provide supply air volumes great enough to meet maximum anticipated total server fan demand when operating at supply air temperatures as high as ASHRAE may allow. That means that HVAC systems and equipment may require increased airflow capacities at the air handlers and economizers as well as in the ductwork, tile perforations and all of the air passageways. These circumstances will limit the operating temperatures and efficiencies achievable in of our legacy data centers with closed loop air conditioning or with fixed systems and equipment already installed.
AIR HANDLER BUILDINGS
Solutions for these limitations can be found through analysis of the contributing factors for energy use in high-performance data center HVAC systems. One of the greatest opportunities to overcome limitations like these in new facilities is through the reduction of the resistance to airflow. In piping systems, the concept of large pipes and small pumps is fairly common.
In data center air systems, however, the conduit for flow is at a much larger scale. Reducing the resistance to flow as much as possible means that the conduits or ducts have to be much larger than normal. Considering that the building limits how large these conduits should be leads us to consider that the building should be part of the conduit and not just a shell that houses it.
A sound way to minimize resistance to flow is to let the building be part of the air- handling system. The walls that segregate spaces are used to confine the air from one point to another. As in other facilities, the space below the raised floor is used as a plenum for the supply air. The truss space above the data center is used as a plenum for return air. Return air plenums are used to convey the air from the truss space down to the first floor air conveying units.
The concept of using the building as part of the air-handling system is not new. Rich Hering, technical director for M+W Group has used this concept in the semiconductor fabrication industry for decades, as that industry went through a similar transformation for many of the same reasons. The transformation from re-circulating air-handing units to open plenum systems with multiple filter fan units occurred in the mid 1990s. The main differences in the two system designs are simply airflow direction and environmental parameter tolerances.
The challenge of making the building be part of the air-handling system in data center design requires careful attention to both the design approach and the construction methods for the spaces that convey the airflow. Considering that the systems are negatively or positively pressurized, the walls that form the duct need to be sealed properly to minimize air leakage.
SERVERS OR CHILLERS?
At a recent Critical Facilities Round Table conference panel in Santa Clara, CA, consultant Subodh Bapat, revealed test data developed for a data center operator in the Middle East. The test demonstrated the effects of servers operating continuously at inlet temperatures of 115°F, without humidity or particulate controls. The server failure rate in the test was about 11 percent per year. Even at this rate of failure, the server replacement cost was found to be dramatically less than the capital costs of installing chillers and HVAC systems plus the annual cost of electrical power to run them. And, in reality, the data center would only operate at such high temperatures for a few days a year, so the actual server failure rate would be much lower and more economical than the tests indicate.
With that said, all these concerns and solutions are based upon the current state-of-the art in server technology. New developments such as servers without fans may very well alleviate many of the concerns discussed above, and save energy, too. Other advances in technology will also change the design and construction of our data centers at a continuously accelerating pace. Advanced network architectures will allow for greater IT redundancy and reduce the need for backup electrical power in our data center spaces. Modular and container data centers will become more prevalent and will take on a more industrial profile in order to “right size” and better deploy our solutions. So, it shouldn’t surprise you to see advances in server technology lead us to a data center without any cooling equipment whatsoever.
So, I suggest that you consider designing your next data center without chillers, but choose NOT to work in a hot aisle on a hot summer day.
CRITICAL FACILITIES ROUND TABLE
In the second half of 2011, the Critical Facilities Round Table (CFRT) will host a series of data center energy solutions workshops led by Pacific Gas & Electric to review energy efficiency, demand response, and self generation solutions suitable for data centers. CFRT is a non-profit organization based in the Silicon Valley that is dedicated to the open sharing of information and solutions amongst our members made up of critical facilities owners and operators. Please visit our web site at www.cfroundtable.org or contact us at 415-748-0515 for more information.