What Defines the New ‘Safe’ Operating Environment?
Is free cooling risky business?
ASHRAE’s Technical Committee 9.9 (TC 9.9) was formed specifically to focus on data centers. Its conclusions have been authoritative in the very conservative mission-critical world since the first edition of the Thermal Guidelines came out in 2004 (the recommended range was 68°F to 77° and 40 to 55 percent relative humidity [RH]). The second edition released in 2008 expanded the recommend range slightly to 64.4° to 80.6°. Nonetheless, even today many data center managers still keep it at a “safe” temperature and humidity; very tightly controlled with 68° and 50 percent RH targets, just as was originally recommended in the 2004 edition.
The 2011 Third Edition openly encourages the more common use of non-mechanical (compressorless) cooling—so-called free cooling—using fresh air economizers to take maximum advantage of ambient air temperatures (within limits) to cool data centers, which would have been considered heresy just a few years ago,
Some bleeding-edge sites, most notably the social media and search giants, issue press releases with ultra-low PUE reports, which are primarily attributable to the use of outside air economizers and higher air intake temperatures, while the rest of us are left to ask what is the safe operating temperature and humidity range for the classic enterprise mission-critical data centers operated by financial institutions, airlines, and governments?
We wonder: are these sites really doing anything differently?
And what are the co-location operators, who need to satisfy their wide variety of customers, doing?
There are clearly some legitimate reasons to keep temperatures low; the first is a concern of loss of thermal ride-through time in the event of a brief loss of cooling, especially for higher-density cabinets, where an event of only a few minutes would cause an unacceptable high intake IT temperature. This rise can occur because of the loss of utility power, and the subsequent transfer to back-up generator, which while it typically takes 30 second or less, will cause most compressors in chillers or CRAC units to recycle and remain off for 5 to 10 minutes or more. While there are some ways to minimize or mitigate this risk, it is a valid concern.
The other is also another common issue; wide variations in IT equipment intake temperatures occur in most data centers due to airflow mixing and bypass air from less-than-ideal airflow management. Most sites resort to overcooling the supply air so that the worst areas (typically end-of-aisles and top of racks) of higher-density areas do not overheat from re-circulated warm air from the hot aisles.
However, if better airflow management is implemented to minimize hotspots, intake temperatures may be slowly raised beyond 68° to 70°. If done properly, it is more likely that 75° to 77° in the cold aisle would no longer be a cause for alarm to IT users within two years. The key to this is to improve communications and educate both the IT and facilities management.
So what is the safe temperature going forward? While the mission-critical industry is very risk adverse, each user and operator does play the basic odds, beginning with a choice of various redundancy levels in Tier 2, 3, and even Tier 4 data centers (where “failure is not an option,” but does sometimes happen), and for those who may not need any redundancy or simply cannot afford it, there is the stepchild of the industry, Tier 1. But despite the different availability projections (you have got to love that word) expressed by the number of nines, there is an inherent presumption that the temperature will be a relatively stable 68° to 70°.
So what is this about the new “x-factor”? No, it is not a remake of the nineties The X-Files. X-factor is clearly defined in “Appendix C. IT Equipment Reliability Data section” and presented as a method “to allow the reader to do their own failure rate projections for their respective data center locale.”
At first blush x-factor seems to run counter to everything that the previous editions of the ASHRAE guidelines held as sacrosanct for maximum reliability; a constant temperature of 68° (20°C) to be tightly controlled 7x24x365. This is still the preferred temperature, but it is now used as the baseline x-factor reference point for A2 rated (50° to 95°) volume servers, with a risk value of 1.0. Seems rational so far, and in fact, ASHRAE provides a chart that indicates a decline in averaged reliability (projected relative failure rate) as temperatures rise above the 68° reference mark but are still within the “recommended range” (e.g., 72.5° =1.13, 77° =1.24). It also lists the projected averaged increasing risk factors as temperatures move into the “allowable” A2 range (81.5°=1.34, 86° =1.42, 90.5° =1.48, and 95° =1.55). This would seem to keep with prior general assumptions regarding higher temperatures impacting projected IT reliability.
Now comes the beginning of the other side of the argument: colder is better below 68°, wherein the reliability improves and the risk factor becomes less than 1.0 (63.5°=0.87 and 59°=0.72). This information may not totally surprise anyone, although some may question the projected risk factor projections, even when keeping within the “recommended” range (0.87 at 63.5° vs. 1.24 at 77°). However, what comes next is the real shocker.
There are charts based on an annualized temperature histogram for major U.S. and international cites that lists the various percentages of the year that each has various weather bin ranges (e.g., 59° to 68°, 68° to 77°, 77° to 81.5°, 81.5° to 86°, 86° to 90.5°, and 90.5° to 95°). From there it lists the weighted calculations of x-factors for each temperature bin, which results in the projected “net x-factor” for that city. There are separate charts for airside and waterside economizers. It is the airside free cooling results that may raise many eyebrows. Here is the information for the example city of Chicago.
Chicago has an x-factor of 0.99, effectively indicating that it is theoretically more reliable (lower relative failure rate) by allowing the IT equipment intake temperatures vary widely from 59° to 95° (over the course of a year using airside economizers), than to keep it stable at 68°. This is because the 59° to 68° bin temperatures represent 67 percent of the year, with a 0.865 x-factor which improves the weighed net x-factor, below 1.0
Here are several other examples of the x-factor for various cities using direct airside economizers:
Various cities using airside economizers:
It should be noted that all of the above are based only on drybulb temperatures and it does not take into account the effects of pollution and humidity introduced by the use of airside economizers.
For waterside economizers the picture differs significantly since the opportunity for improved reliability is diminished in most cases due the lower number of hours/days available for free cooling that would result in air intake temperatures of less than 68º.
Various cities using waterside economizers:
However, in the case of waterside economizers the IT intake air is not subjected to wider humidity variations from outside air or any related pollution issues, which ignored in these calculations and would improve reliability compare to exposure to outside air.
THE BOTTOM LINE
While I am not pretending to denigrate the x-factor methodology of projecting IT equipment failure, it does take bit more than a grain of salt to accept it at face value. While it may be presumably true that the IT equipment operating at lower temperatures will have a statistically lower failure rate over time, and conversely higher failure rate at higher temperatures, it does not necessarily track that one factor will offset the other enough (or at all), to support the mathematical calculation resulting in the prediction of a greater projected reliability, vs. keeping the IT equipment at a constant 68° reference temperature.