Measuring And Improving Data Center Cooling Efficiency

There are two primary trends driving data center efficiency, rising cost of electricity and increasing server density. Cooling systems are responsible for consuming about half of a typical data center’s power, making cooling infrastructure a data center’s most energy-intensive subsystem. This and the need for maximizing cooling capacity are driving efforts to improve data center cooling efficiency. Eliminating data center cooling over-capacity is often the easiest and lowest cost approach to reducing data center utility costs. Research by Upsite Technologies reveals that of 45 sites studied, the average site had nearly four times more rated cooling capacity than heat load. Once an afterthought, efficient and effective computer room cooling must now be a top priority for data center and facilities managers.

KNOWING YOUR COOLING CAPACITY FACTOR

Although power usage effectiveness (PUE) has been embraced both domestically and internationally as a comprehensive metric for determining overall data-center efficiency, it does not indicate cooling effectiveness or efficiency. A metric focused solely on cooling efficiency is essential to identifying describing and isolating efficiency problems relating to cooling. One such metric is called the cooling capacity factor (CCF).

The CCF reflects the percentage of cooling capacity actually used for cooling; it is calculated by dividing total rated cooling capacity (kW) by 110% of the IT critical load (kW). The 110% figure is used to account for additional room load, including lights, people, and building envelope not reflected in the UPS output. (A complimentary CCF calculator can be found at http://upsite.com/cooling-capacity-factor-calculator.)

The CCF can indicate the extent of stranded capacity, over-deployed cooling and redundancy, and can therefore be used to determine the extent of possible reductions in cooling energy costs. When airflow is optimized and redundant systems are placed in inactive standby, operational cost savings are immediately realized from reduced expenditures for maintenance and cooling-unit/fan-motor electricity.

CASE STUDY: CALCULATING THE CCF

Upsite used the CCF to calculate the potential cost savings of improving airflow management (AFM) in the computer room of an international insurance company. The CCF of the facility was determined to be 3.1, which means the rated capacity of the cooling infrastructure was 310% of what was needed to support the IT load. The CCF showed that in this case, an astonishing 18 cooling units could be turned off if AFM improvements were made. Potential electrical load reduction was an estimated 122kW. The total resulting cost savings that could be realized from AFM improvements was conservatively estimated at $165,972/year:

122 kW x 24 hours/day x 30.4 days/month x $0.12/kWh = $10,681/month = $128,172/year

Cost savings from reduced maintenance was estimated similarly at:

18 x $175/unit-month = $3,150/month = $37,800/year

The total direct savings estimate was $13,831/month, or $165,972/year

COMPUTER ROOM COOLING ASSESSMENT ESSENTIALS

Once the CCF has been calculated, the next step for optimizing a data center’s cooling infrastructure is conducting a comprehensive assessment of airflow and thermal management metrics, for both IT (e.g., servers) and supporting infrastructure equipment. This assessment, whether performed by internal staff or external consultants, provides the foundation for a plan of action to improve AFM with measurable goals for efficiency gains. When assessing the effectiveness and efficiency of current conditions, the following six areas must be evaluated thoroughly.

1. COLD SPOTS

Cold spots have now become even more prevalent than hot spots. Data from the last eight sites studied by Upsite reveals cold spots in an average of 35% of cabinets, while hot spots were present in only 7% of cabinets at the same eight sites. In one specific computer room studied, 85% of cabinets contained cold spots. The 7,000-sq-ft site had a CCF of 3.8. With an estimated $10,000 in proposed AFM remediation costs, the site could expect to see a 100% ROI in just five months.

Just like their hot counterparts, cold spots can be harmful to equipment. Some IT equipment does not operate properly when temperatures become too cold, so ASHRAE recommends IT equipment intake air temperatures be 64°F to 80.6° for maximum reliability. Eighty degrees is at the upper end, although most data centers are operating way below this limit. While maximum reliability is assured within the range, efficiency and capacity are maximized at the upper limit of the temperature range. When temperatures in the computer room are cooler than necessary, the capacity and efficiency of cooling equipment is reduced.

2. HOT SPOTS

Hot spots can be difficult to identify – especially when they are sparsely distributed — so an infrared (IR) survey (leveraging an IR camera) is often required to locate and quantify them. Simpler approaches, like data center infrastructure management (DCIM) sensors and manual walkthroughs, often fail to identify damagingly high IT intake temperatures in isolated locations. A recent Upsite study of a large university revealed that 80% of its cabinets contained hot spots, despite the facility’s CCF of 4.1. A survey of IT intake temperatures using an IR camera can identify these locations so that they can be remediated before equipment failure.

3. COOLING UNIT SETPOINTS

Manufacturers rate their cooling units on standard return-air conditions at the unit of 75° with 45% relative humidity (rh). But because most enterprises run their cooling units with setpoints lower than standard conditions, the rated capacity cannot be delivered. As a result, many data centers strand cooling capacity and run more cooling units than are necessary. For example, a common 20-ton (70 kW) cooling unit has 20 tons (70 kW) of total capacity with return air at 75° and 45% rh. But with return air at 70° and 48% rh, the same 20-ton cooling unit has a cooling capacity of only 17 tons (59.7 kW).

Raising a cooling unit’s setpoint by 1° results in a corresponding drop of 1% to 3% in the power required for cooling. But once a server’s inlet air temperature exceeds a certain level (around 78°), increased server fan speed can negate the energy savings of raising the cooling unit setpoint. Determining the threshold for a site can only be accomplished through a detailed study.

When changing the setpoint, avoid rapid fluctuations in temperature, especially in an environment where devices may have been operating at a cold temperature. Changes should be executed at a rate of change not greater than 9° per hour. This ensures that thermal expansion happens slowly and equipment is not damaged.

4. HUMIDITY

It is important to ensure that humidity is neither too high nor too low and rh setpoints are consistent across air conditioning units.

In some cooling configurations of high rh and low temperature setpoints condensation can form on cooling unit coils (i.e., latent cooling). Moisture condensing on cooling unit coils actually gives off heat that consumes some of a cooling unit’s cooling capacity, stranding capacity that could otherwise be used to reduce air temperature of the supply air to IT equipment.

When cooling unit setpoint configuration results in moisture condensing on the coils, the moisture gathers in drain pans and runs out of the building. The dehumidification of the environment causes the cooling units to go into humidification. This condition increases operating cost and adds heat load to the room. Related to this issue is the importance of calibrating the return air rh sensors. In the worst case, multiple data-center air conditioners can work against each other while trying to maintain different humidity levels. With humidity control consuming up to 30% of the cooling system’s energy, this situation must be avoided.

5. PERFORATED TILE PLACEMENT

Perforated tile placement is one of the easiest and least expensive ways to manage airflow. Nearly free, adjusting tiles only costs whatever labor is needed to move them around. Though one of the simplest ways to improve AFM, tile placement is often overlooked. Misplaced perforated tiles in a computer room strand cooling capacity because locating these tiles or grates in an open area or hot aisle allows valuable conditioned air to leave the raised-floor plenum. The volume of air lost through these tiles is not available to cool IT equipment. This unused conditioned air is a form of stranded capacity. In an Upsite study of 45 data centers, on average 77% of perforated tiles were properly placed. One site properly placed only 7% of its tiles.

6. RAISED FLOOR BYPASS OPEN AREA

Managing openings in the raised floor is fundamental to all AFM strategies. Similar to misplaced perforated tiles, unsealed cable openings release bypass airflow that strands cooling capacity because the conditioned air escaping through these cable openings can no longer be used by IT equipment. Of 45 sites assessed by Upsite, the average raised floor bypass open area percentage was 48%. The closure of these openings is discussed across the industry, yet the average data center still has almost half as much open area leaking air as delivering air to IT equipment. This number is only down 17% from 65%, the average as determined in original Uptime Institute research performed a decade ago.

The data in Table 1 was gathered by Upsite Technologies from assessing 45 separate data centers. The data indicate the extent to which typical data centers are managing key airflow metrics.

ASSESSING FUTURE GROWTH AND REQUIREMENTS

The comprehensive computer room cooling assessment must consider not only current data center cooling performance, but also projections for future data center growth. Data center overcooling often results from a desire to accommodate planned future growth that often ends up either being delayed or failing to materialize altogether. The assessment must project accurate timelines of future growth in order to anticipate optimal placement of future IT equipment and to ensure that the deployment of AFM solutions (such as aisle containment and DCIM) is scheduled only when prudent and necessary.

REDUCING STRANDED CAPACITY, THE ENEMY OF COOLING EFFICIENCY

Stranded cooling capacity is the portion of the cooling system that is running but not contributing to cooling IT equipment. The impact of stranded capacity is wasted energy, money, and capacity. Stranded capacity can cause organizations to invest in new cooling infrastructure unnecessarily, and in the worst case, cause a new data center to be built prematurely.

REMEDIATION

To be successful at managing computer rooms for capacity, reliability and cost goals, both IT and facilities personnel must cooperate to develop a shared understanding of cooling capacity and management practices. Without a collective awareness of the science behind AFM best practices, important AFM elements can “drift” over time. For example, perforated tiles move to poor locations, new openings are cut and not sealed, and IT equipment is placed without proper regard to airflow patterns. The most successful organizations have a team with members from four quadrants of the organization as shown in Figure 1.

To achieve optimal cooling efficiency and avoid stranded capacity, Upsite recommends executing thermal and airflow management at each of a facility’s four “Rs:” raised floor, racks, rows, and room. The methodology of the four Rs should be used to seal openings, make improvements, monitor the results, and then continue to implement as needed. The 4R’s should be addressed in the sequence listed to achieve the best results. In previous studies, Upsite witnessed facilities that installed full containment (the third R) only to limit the extent of its potential benefits due to a lack of AFM prerequisites, including missing raised floor grommets (first R) and blanking panels (second R). After a site has made improvements at any one of the first three R levels it is in a position to make changes at the Room level, its cooling infrastructure, to take full advantage of the improved AFM.

CONCLUSION

Data centers now consume approximately 3% of all electricity in the United States, and that percentage is expected to continue increasing through the foreseeable future. In addition, electrical costs per kWh are increasing, so energy costs are becoming an increasing portion of the total cost of ownership of IT equipment. Data center efficiency will become only more important as these trends continue. The good news is that there is much room for improvement, as most data centers operate nowhere near their peak efficiency. Calculating a data center’s CCF gives a sense of what efficiency gains are possible for a given data center. From there, the next step is to conduct a thorough assessment of hot/cold spots, temperature/humidity setpoints, perforated tiles, and raised-floor openings within the context of planned future growth to determine what modifications provide the best efficiency gains and return on investment. With careful research, planning, and implementation, substantial data-center efficiency and capacity gains are possible.