Figure 1. The breakdown of power utilization in a typical data center

Rising energy prices and growing concerns about global warming due to carbon emissions combine to increase the need to lower the power usage effectiveness (PUE) of data centers worldwide. The PUE of a data center is defined as total facility power/total IT power. Total facility power comprises all the power delivered to the entire data center, and the total IT power is defined as the power that is delivered to the IT equipment. A careful look at this ratio (Figure 1) reveals that power to drive the data center cooling system (45 percent) and the power consumed by the IT equipment (30 percent) dominate total facility power.

Figure 2. A 2500 sq.ft. data center that could operate more efficiently

Another way to say this is that the cooling system uses 75 percent of the non-IT power. By focusing on the power to drive the cooling system and IT equipment as the dominant parameters, the relationship simplifies to be the total cooling power/total IT power, which is often referred to as the cooling load factor (CLF). The CLF is the total power required by the chillers, CRACS, cooling towers, pumps, and other cooling related equipment, divided by the total IT equipment power. The kind of cooling unit (gas or liquid), the efficiency of the motors that drive the fan and compressors, as well as the specific geographic location of the data center affect the total annual cost of energy to drive the cooling system for a given data center.

If power measurements of this equipment are not feasible, estimates must be made using detailed knowledge of the cooling equipment. For example, the cooling supplied by the equipment can substitute for power required by the cooling equipment. In this sense, the relationship becomes a ratio of the total cooling supplied and the IT power. This ratio can be defined as the “cooling supply to IT load ratio.” Driving the ratio of these two parameters as close as possible to 1.0 will drive the PUE in direct proportion.

Figure 3. Base model rack inlet temperature profiles

The cooling for a given data center consists of two primary components: the total capacity of the cooling system, typically measured in tons or kilowatts (kW) and its related airflow, typically measured in cubic feet per minute (CFM). Many data centers develop hot spots not because of a lack of total cooling capacity (this is typically more than adequate) but rather because the system cannot deliver cold air where it is needed.

Computational fluid dynamics (CFD) can help illustrate the point using a hypothetical data center of 2500 square feet as illustrated in Figure 2. For this data center, eight Liebert FH600C cooling units provide total cooling capacity of 1724 kilowatts. The thermal load consists of six rows of equipment racks, each row containing 20 racks, and each rack with a thermal load of 7 kW for a total of 840 kW. This results in a cooling supply to IT load ratio of 2.0, a full 100 percent higher than should be required to cool the equipment. Notice, however, that the airflow supplied by each of the eight FH600C units is only 17,100 CFM, creating a total airflow capacity of 136,800 CFM. Each 7-kW rack requires 1091 CFM to keep the temperature rise across the rack to a 20 F maximum, so with 120 racks in the room, the total rack demand is 130,920 CFM, nearly 5 percent more than the supply. This will become a significant consideration when attempting to reduce the overall power consumption.

To optimize the PUE for this data center, the cooling supply to IT load ratio must be reduced to as close to 1.0 as possible. The Liebert FH600C uses an 11-kW centrifugal blower to supply air to the data center. If the cost of electricity were $0.10/kWh, the annual cost of operating just the blower for this unit would exceed $10,000, and would be nearly twice that amount when including the work done by the compressor. Shutting down one of these units would reduce the PUE and save money. The question, however, is whether or not this can be done without causing excessive temperatures at any of the server inlets? While shutting down a CRAC unit looks like a viable option, only a CFD model can identify which CRAC is the best one to shut down and whether doing so will result in troublesome hot spots on any of the equipment.

Figure 3 illustrates the rack inlet temperatures in the data center with all CRACs operating normally. There are already hot spots located at the ends of the rack rows. In some cases, the rack inlet temperatures exceed the ASHRAE recommended maximum of 80.6 F. The maximum ambient temperature in the room for this case is 96 F. Turning off both the fan and coil on any of the eight CRAC units could cause extreme temperatures even though the total cooling capacity would be sufficient, due to the lack of proper airflow to some servers. Using CFD is a straightforward way to test this possibility and to determine the best CRAC to disable.

Table 1. Comparison of maximum room and rack inlet temperatures for eight trials where a CRAC was shut off; Simulation 3 and 4 generated the worst results; simulation 6 the best.

CFD simulations compared the eight scenarios, running a series of eight simulations concurrently, each with a different CRAC unit off. The temperature scale was preset to a range of 57-90 F to allow for an easy comparison. A summary of the simulations is presented in Table 1. The best case, highlighted in green, corresponds to the elimination of CRAC F (lower right hand corner). It has the least impact on the maximum rack inlet temperature and drives up the maximum ambient temperature in the room by 4 degrees from 90 F to 94 F, according to the detailed CFD output reports. The resulting cooling supply to IT load ratio decreases by 25 percent when this CRAC is disabled, reducing the annual operating cost by at least $10,000. But even in the best case, when CRAC F is shut off, the rack inlet temperatures still peak at 84 F in one of the racks, exceeding the ASHRAE recommended standard for inlet temperature. Therefore the approach of simply turning off one or more CRAC units will not work for this data center without first making some kind of adjustments to the room configuration to improve the thermal efficiency.

Improving Thermal Efficiency

The two common methods for improving the thermal efficiency of data centers are hot- and cold-aisle containment. Cold-aisle containment is typically less expensive to implement because perforated tiles are often located near the rack inlets and therefore less ductwork is required. Also, containing the cold supply air drives up the ambient room temperature. Depending on the resulting room temperature, this approach may not be comfortable for service technicians or administration personnel working in the room.

Table 2. Comparison of maximum room and rack inlet temperatures for the cold and hot aisle containment strategies.

The opposite problem occurs with hot-aisle containment, as the entire room becomes part of the cold supply, driving the ambient room temperature downward. In this scenario, however, walls, UPSs, lights, and other equipment contribute additional heat. The additional heat tends to increase the ambient temperature in the room, but if the supply air is well directed towards the rack inlets, the heat will have less impact on the equipment. In addition, possible pressure variations due to containment solutions may result in inadequate airflow for some servers. For example, the rack exhaust of a fully loaded rack could restrict the exhaust flow of an adjacent partially loaded rack.

The CFD model can be quickly modified to consider each scenario so that these methods can be evaluated.

Table 2 shows a comparison of the two approaches using the maximum rack inlet temperature and maximum ambient room temperature as common metrics. In both cases, no other heat sources in the room were included, and a small amount of leakage was permitted through the containment walls. Such leakage is inevitable because the racks demand more air than the CRACs can supply, so there is recirculation into the cold-aisle when that strategy is used or recirculation out of the hot-aisle when that strategy is used. Both containment methods drop the maximum rack inlet temperature down compared to the original case. But for this data center, the hot-aisle containment strategy is preferable. The difference between the strategies has to do with mixing. The air that leaks out of the hot-aisle mixes with the room air, increasing its temperature. The air that leaks into the cold-aisle has the same effect. However, better mixing in the hot-aisle case leads to lower maximum temperatures at the rack inlets while poor mixing in the cold-aisle case allows hot spots at higher temperatures to occur at the rack inlets. While this behavior is not generally true, for this particular data center, hot-aisle containment appears to be preferable. In short, a hot-aisle containment scheme gives rise to a maximum inlet temperature of 77 F, so sustained operation using seven cooling units is feasible.

Table 3. Maximum rack inlet and room temperatures using hot aisle containment; simulation 4 yields the best results. Simulation 2 the worst.

With the optimal method of containment determined, the issue of optimizing power consumption by the cooling system can be addressed. By containing the hot air, the data center can operate with only seven CRAC units in operation at any one time and still have rack inlet temperatures well below the ASHRAE recommended inlet temperatures of 80.6 F as seen in Table 3. This scenario results in an estimated savings of between $10 to 20,000 per year in operational costs without sacrificing equipment performance.

Without any containment the CRAC failure analysis predicted worst-case rack inlet temperatures as high as 91 F. However the hot aisle containment solution also increased the reliability of the data center to an “N+1” level of sustainability. This means the data center can be run with all eight CRACs on, and if any single unit fails or must be taken down for servicing, rack inlet temperatures will not exceed 77 F, which is well within the ASHRAE rack inlet temperature standard.

In summary, this particular data center illustrates how CFD can be used to compare some of the many techniques available to improve PUE. When striving to improve PUE, data center managers should focus on the CLF as a primary target, along with the purchase of Energy Star equipment when replacing or adding equipment. If cooling power values are not readily accessible, the cooling supply to IT load ratio will work as well. Using this ratio, CFD can be effectively used as a decision support tool to compare and contrast alternative approaches. Of course, modeling makes assumptions that must be validated with measurements to ensure that the model is representing real world phenomena and is not meant to be a substitute for good engineering. Yet modeling will always produce a relative comparison of one design approach with another and is a helpful mechanism for supporting the decision making process.