|
|
Improving Data Center PUE Through Airflow Management
by Paul Bemis
Liz Marshall
November 1, 2009
|
|
|
Figure
1. The breakdown of power utilization in a typical data center
|
|
Cooling
is part of PUE too
Rising
energy prices and growing concerns about global warming due to carbon
emissions combine to increase the need to lower the power usage
effectiveness (PUE) of data centers worldwide. The PUE of a data
center is defined as total facility power/total IT power. Total
facility power comprises all the power delivered to the entire data
center, and the total IT power is defined as the power that is
delivered to the IT equipment. A careful look at this ratio (Figure
1) reveals that power to drive the data center cooling system (45
percent) and the power consumed by the IT equipment (30 percent)
dominate total facility power.
|
|
| Figure
2. A 2500 sq.ft. data center that could operate more efficiently |
|
Another
way to say this is that the cooling system uses 75 percent of the
non-IT power. By focusing on the power to drive the cooling system
and IT equipment as the dominant parameters, the relationship
simplifies to be the total cooling power/total IT power, which is
often referred to as the cooling load factor (CLF). The CLF is the
total power required by the chillers, CRACS, cooling towers, pumps,
and other cooling related equipment, divided by the total IT
equipment power. The kind of cooling unit (gas or liquid), the
efficiency of the motors that drive the fan and compressors, as well
as the specific geographic location of the data center affect the
total annual cost of energy to drive the cooling system for a given
data center. If power measurements of this
equipment are not feasible, estimates must be made using detailed
knowledge of the cooling equipment. For example, the cooling supplied
by the equipment can substitute for power required by the cooling
equipment. In this sense, the relationship becomes a ratio of the
total cooling supplied and the IT power. This ratio can be defined as
the “cooling supply to IT load ratio.” Driving the ratio of these
two parameters as close as possible to 1.0 will drive the PUE in
direct proportion.
|
|
| Figure
3. Base model rack inlet temperature profiles |
|
The
cooling for a given data center consists of two primary components:
the total capacity of the cooling system, typically measured in tons
or kilowatts (kW) and its related airflow, typically measured in
cubic feet per minute (CFM). Many data centers develop hot spots not
because of a lack of total cooling capacity (this is typically more
than adequate) but rather because the system cannot deliver cold air
where it is needed. Computational fluid
dynamics (CFD) can help illustrate the point using a hypothetical
data center of 2500 square feet as illustrated in Figure 2. For this
data center, eight Liebert FH600C cooling units provide total cooling
capacity of 1724 kilowatts. The thermal load consists of six rows of
equipment racks, each row containing 20 racks, and each rack with a
thermal load of 7 kW for a total of 840 kW. This results in a cooling
supply to IT load ratio of 2.0, a full 100 percent higher than should
be required to cool the equipment. Notice, however, that the airflow
supplied by each of the eight FH600C units is only 17,100 CFM,
creating a total airflow capacity of 136,800 CFM. Each 7-kW rack
requires 1091 CFM to keep the temperature rise across the rack to a
20 F maximum, so with 120 racks in the room, the total rack demand is
130,920 CFM, nearly 5 percent more than the supply. This will become
a significant consideration when attempting to reduce the overall
power consumption. To optimize the PUE for
this data center, the cooling supply to IT load ratio must be reduced
to as close to 1.0 as possible. The Liebert FH600C uses an 11-kW
centrifugal blower to supply air to the data center. If the cost of
electricity were $0.10/kWh, the annual cost of operating just the
blower for this unit would exceed $10,000, and would be nearly twice
that amount when including the work done by the compressor. Shutting
down one of these units would reduce the PUE and save money. The
question, however, is whether or not this can be done without causing
excessive temperatures at any of the server inlets? While shutting
down a CRAC unit looks like a viable option, only a CFD model can
identify which CRAC is the best one to shut down and whether doing so
will result in troublesome hot spots on any of the equipment.
Figure 3 illustrates the rack inlet
temperatures in the data center with all CRACs operating normally.
There are already hot spots located at the ends of the rack rows. In
some cases, the rack inlet temperatures exceed the ASHRAE recommended
maximum of 80.6 F. The maximum ambient temperature in the room for
this case is 96 F. Turning off both the fan and coil on any of the
eight CRAC units could cause extreme temperatures even though the
total cooling capacity would be sufficient, due to the lack of proper
airflow to some servers. Using CFD is a straightforward way to test
this possibility and to determine the best CRAC to disable.
|
|
| Table
1. Comparison of maximum room and rack inlet temperatures for eight
trials where a CRAC was shut off; Simulation 3 and 4 generated the
worst results; simulation 6 the best. |
|
CFD
simulations compared the eight scenarios, running a series of eight
simulations concurrently, each with a different CRAC unit off. The
temperature scale was preset to a range of 57-90 F to allow for an
easy comparison. A summary of the simulations is presented in Table
1. The best case, highlighted in green, corresponds to the
elimination of CRAC F (lower right hand corner). It has the least
impact on the maximum rack inlet temperature and drives up the
maximum ambient temperature in the room by 4 degrees from 90 F to 94
F, according to the detailed CFD output reports. The resulting
cooling supply to IT load ratio decreases by 25 percent when this
CRAC is disabled, reducing the annual operating cost by at least
$10,000. But even in the best case, when CRAC F is shut off, the rack
inlet temperatures still peak at 84 F in one of the racks, exceeding
the ASHRAE recommended standard for inlet temperature. Therefore the
approach of simply turning off one or more CRAC units will not work
for this data center without first making some kind of adjustments to
the room configuration to improve the thermal efficiency.
Improving Thermal Efficiency
The
two common methods for improving the thermal efficiency of data
centers are hot- and cold-aisle containment. Cold-aisle containment
is typically less expensive to implement because perforated tiles are
often located near the rack inlets and therefore less ductwork is
required. Also, containing the cold supply air drives up the ambient
room temperature. Depending on the resulting room temperature, this
approach may not be comfortable for service technicians or
administration personnel working in the room.
|
|
| Table
2. Comparison of maximum room and rack inlet temperatures for the
cold and hot aisle containment strategies. |
|
The
opposite problem occurs with hot-aisle containment, as the entire
room becomes part of the cold supply, driving the ambient room
temperature downward. In this scenario, however, walls, UPSs, lights,
and other equipment contribute additional heat. The additional heat
tends to increase the ambient temperature in the room, but if the
supply air is well directed towards the rack inlets, the heat will
have less impact on the equipment. In addition, possible pressure
variations due to containment solutions may result in inadequate
airflow for some servers. For example, the rack exhaust of a fully
loaded rack could restrict the exhaust flow of an adjacent partially
loaded rack. The CFD model can be quickly
modified to consider each scenario so that these methods can be
evaluated. Table 2 shows a comparison of the
two approaches using the maximum rack inlet temperature and maximum
ambient room temperature as common metrics. In both cases, no other
heat sources in the room were included, and a small amount of leakage
was permitted through the containment walls. Such leakage is
inevitable because the racks demand more air than the CRACs can
supply, so there is recirculation into the cold-aisle when that
strategy is used or recirculation out of the hot-aisle when that
strategy is used. Both containment methods drop the maximum rack
inlet temperature down compared to the original case. But for this
data center, the hot-aisle containment strategy is preferable. The
difference between the strategies has to do with mixing. The air that
leaks out of the hot-aisle mixes with the room air, increasing its
temperature. The air that leaks into the cold-aisle has the same
effect. However, better mixing in the hot-aisle case leads to lower
maximum temperatures at the rack inlets while poor mixing in the
cold-aisle case allows hot spots at higher temperatures to occur at
the rack inlets. While this behavior is not generally true, for this
particular data center, hot-aisle containment appears to be
preferable. In short, a hot-aisle containment scheme gives rise to a
maximum inlet temperature of 77 F, so sustained operation using seven
cooling units is feasible.
|
|
| Table
3. Maximum rack inlet and room temperatures using hot aisle
containment; simulation 4 yields the best results. Simulation 2 the
worst. |
|
With
the optimal method of containment determined, the issue of optimizing
power consumption by the cooling system can be addressed. By
containing the hot air, the data center can operate with only seven
CRAC units in operation at any one time and still have rack inlet
temperatures well below the ASHRAE recommended inlet temperatures of
80.6 F as seen in Table 3. This scenario results in an estimated
savings of between $10 to 20,000 per year in operational costs
without sacrificing equipment performance. Without
any containment the CRAC failure analysis predicted worst-case rack
inlet temperatures as high as 91 F. However the hot aisle containment
solution also increased the reliability of the data center to an
“N+1” level of sustainability. This means the data center can be
run with all eight CRACs on, and if any single unit fails or must be
taken down for servicing, rack inlet temperatures will not exceed 77
F, which is well within the ASHRAE rack inlet temperature
standard. In summary, this particular data
center illustrates how CFD can be used to compare some of the many
techniques available to improve PUE. When striving to improve PUE,
data center managers should focus on the CLF as a primary target,
along with the purchase of Energy Star equipment when replacing or
adding equipment. If cooling power values are not readily accessible,
the cooling supply to IT load ratio will work as well. Using this
ratio, CFD can be effectively used as a decision support tool to
compare and contrast alternative approaches. Of course, modeling
makes assumptions that must be validated with measurements to ensure
that the model is representing real world phenomena and is not meant
to be a substitute for good engineering. Yet modeling will always
produce a relative comparison of one design approach with another and
is a helpful mechanism for supporting the decision making process.
|
Paul Bemis Paul Bemis is currently the president and CEO of Applied Math Modeling, a supplier of CAE software tools and services. He has over 20 years experience in the high technology market holding executive positions at ANSYS, Fluent, HP, and Apollo Computer.
Liz Marshall Liz Marshall has worked in the field of computational fluid dynamics since 1989. Most of this time was spent at Fluent Inc, doing customer support, consulting, product management, and marketing. For eight years she served as the editor of Fluent News magazine. She has a BS in mathematics from St. Lawrence University and a PhD in physics from Dartmouth College.
Did you enjoy this article? Click here to subscribe to the magazine.
|
| |
|
|
|