Data Center Cooling Metrics
What, how, and why to measure.
Modern data centers continue to evolve at a rapid pace. While methodologies and techniques for cooling continue to advance, some of the basic lessons that have proven themselves over time continue to be underutilized. New technology and techniques can often be helpful, but without employing fundamental airflow management metrics the full benefits of advanced cooling methods cannot be realized. Fundamental data center metrics have been the basis of many publications and presentations since the industry’s founding, but emphasis on the fundamentals has dropped off over the last several years because advances in cooling methods such as containment, free cooling, and evaporative cooling have held the spot light. However, applying the fundamentals is crucial to getting the best results, regardless of having a legacy cooling configuration or the latest advanced free cooling methodology. Therefore, experts in the field have placed a renewed focus on fundamentals and have recently broached the topic at many of the industry’s biggest events.
Airflow management (AFM), in a nutshell, is about improving data center airflow so the least amount of conditioned air at the highest supply temperature can be used to effectively cool IT equipment. The following metrics discussed can help you identify which fundamental items can be improved, thus increasing your data center’s cooling capacity, IT performance, and energy savings regardless of the configuration of your room or the cooling methodology being used.
Power Usage Effectiveness (PUE)
Created by The Green Grid, PUE has become the most widely used metric for assessing the energy efficiency of a data center. In fact, PUE data reveals that cooling infrastructure is the single largest consumer of data center power (typically around half), and therefore, the largest contributor to a high PUE value. Considered the highest level metric to look at overall efficiency, measuring PUE is a great starting point to measure data center performance and track changes/improvements made to a data center over time. Represented by the formula:
PUE = Total Facility Energy
IT Equipment Energy
PUE is determined by dividing the amount of power entering a data center (total facility power) by the power used by the computer equipment.
While an extremely important tool, PUE cannot tell you specifically what to improve to make a data center more energy efficient. Additionally, PUE is not a standalone reference point that provides useful information when calculated infrequently. While the average PUE of data centers surveyed has been dropping over recent years, there is still a great deal of room for improvement. Additionally, there has been a growing trend of misuse of PUE. Many sites are calculating a partial PUE (pPUE) by not including all loads in the total site power but reporting it as total site PUE. pPUE can be a valuable measurement but should be reported appropriately.
Cooling Capacity Factor (CCF)
Cooling equipment consumes the most power in a data center behind the IT equipment. Developed by Upsite Technologies, CCF is a metric used to estimate the utilization of the computer room cooling capacity. By determining how well the cooling infrastructure is being utilized, you can identify potential gains as a result of AFM improvements and controls adjustments. This is fundamental to improving the cooling of the entire data center (free cooling, chiller plants, etc.) and has the greatest leverage toward improving your PUE. CCF is calculated by dividing the total rated cooling capacity (kW) by 110% of the IT critical load (kW):
Total rated cooling capacity is the sum of the running cooling units’ rated capacities. If all cooling units are running, then this will be the same value as the total installed rated cooling capacity. A CCF of around 1.2 is most desirable, although a score of 1.5 to 3.0 is most common. In the latter case, there is likely significant stranded cooling capacity that can be recovered through improvements to AFM.
While many data centers have monitoring in place via a multitude of sensors placed in a variety of locations, very few regularly check to identify the effectiveness of the cooling for every U space of every cabinet in the computer room. This is important because hot spots can occur in very isolated locations that sensors can often miss. To help avoid this, infrared cameras and infrared thermometers should be utilized regularly to identify hot spots. It is also important to identify the percentage of cabinets with cold spots and the percentage of cabinets with hot spots so that you can determine which areas need focus.
There is a direct correlation in data centers between the range in intake air temperatures and the efficiency of the cooling infrastructure. Ideally, the difference between the warmest intake temp and the coldest intake temp should be 5 degrees or less. If not, there is room for improvement. ASHRAE’s recommended range for intake temperatures is between 64°F and 80.6°F. While intake temps below 64°F are not going to impact IT reliability, they are an indication that an excessive amount of energy is being used to cool the room. Hot spots (intake temps above the desired maximum for the site), are an indication that the cooling system is not effective and the IT equipment reliability may be compromised — a situation which needs to be remedied as soon as possible.
Raised Floor Bypass Open Area
For data centers that use raised floors, this is one of the simplest and most important metrics. It’s merely what percentage of the holes in the raised floor are in a “good” location and what percentage of the holes in the raised floor are in a “bad” location. Good means that the air coming out of a tile is directly used by IT equipment. Bad means supply air coming out of the opening is not being consumed by IT equipment. The only good type of open area is the supply tiles (perforated tiles or grates) directly in front of IT equipment. The two types of bad open areas are unsealed cable openings under cabinets and around the perimeter of the room, and misplaced supply tiles (in open areas or hot aisles).
For example, if a computer room had one cabinet with one standard perforated tile in front of it with 25% open area (1 sq ft) and there was one unsealed 12- x 12- in. (1 sq ft) cable cut out, then the total raised floor open area would be 2 sq ft. The raised floor bypass open area would be 1 sq ft or 50% bypass open area.
Although many data centers have made an effort to seal cable openings and other potentially harmful holes in the raised floor, very few have completed the job. These remaining openings can easily release significant flow rates of conditioned air which limits the capacity and efficiency of the cooling infrastructure. The goal is to have no bypass open area; the only openings in the floor being the supply tiles in front of IT equipment. It is particularly important to seal or (depending on the design) at least reduce the open area under electrical equipment, such as power distribution units (PDU) or remote power panels (RPP).
Perforated Tile Placement
The use of perforated tiles is one of the simplest and easiest ways to manage airflow in a computer room. However, few data centers do this well, despite the fact that perforated tiles can be a quick and relatively inexpensive fix to improve cooling. Even in well-managed sites, there is often still room for improving perforated tile placement.
There seems to be a growing trend of not adjusting the placement of perforated tiles to the actual load of the computer room. In a research study I conducted of 45 data centers across the world, only six sites (13%) had properly placed every perforated tile. This is especially sobering when you consider the amount of wasted energy needed to keep these data centers properly cooled.
The definition of a properly placed perforated tile is within two tile positions of IT equipment intakes. Conversely, an improperly placed perforated tile is typically any tile in a hot aisle or open area of the room. However, there are important exceptions. For example, if IT equipment has been mounted backward (from a hot aisle/cold aisle perspective) with the intake in the hot aisle, then a perforated tile is likely needed in the hot aisle until the equipment can be turned around. This should never occur in the first place but still does surprisingly often.
Bypass Airflow (Ratio Of Supply Airflow To It Airflow)
The definition of bypass airflow is any conditioned air that does not pass through IT equipment before returning to cooling units. The only way to improve bypass airflow is to reduce the total flow rate of air moving through the room via the cooling units. In many cases, the total flow rate of air supplied by cooling units is two to three times the total airflow rate required by the IT equipment. This much excess bypass airflow is often necessary to overcome poor AFM. However, if you improve the AFM, it may be possible to reduce the total volume of air flowing through the room. To identify how much bypass airflow is occurring in the room it is necessary to determine the total flow rate of air moving through the IT equipment and compare this to the total flow rate moving through all the cooling units.
Historically, blade servers have produced higher Delta Ts (∆Ts) than traditional rack-mount servers (“pizza box” servers). In other words, the cool supply air entering a blade chassis would exit as hotter air than would the supply air entering a pizza box server. This difference is described by the equation of heat transfer:
q = Cp x W x ∆T
q = amount of heat transferred
Cp = specific heat of air
W = mass flow
∆T = temperature rise of air across the heat source
When we normalize the terms for units we typically deal with, this relationship is described as:
CFM = 3.16 x Watts
CFM = cubic feet per minute of airflow through the server
3.16 = factor for density of air at sea level in relation to °F
∆T = temperature rise of air passing through the server in °F
Based on this relationship, a 5kW blade server chassis with 16 servers and a 35°F ∆T would draw 451.4 CFM:
451.4 CFM = 3.16 x 5,000
In contrast, ten 500W pizza box servers with a 20°F ∆T would draw 790 CFM:
790 CFM = 3.16 x 5,000
In a data center with 1,600 blades (100 chassis), the servers would consume 45,140 CFM of chilled air (100 chassis x 451.4 CFM per chassis = 45,140 CFM) as opposed to a data center with 1,000 pizza box servers, which would consume 79,000 CFM of chilled air (1,000 servers x 79 CFM per server = 79,000 CFM).
Table 2 shows the CFM required to cool a kW of IT load relative to the IT equipment ∆T.
By estimating the average ∆T of the IT equipment in the room you can estimate the average CFM required to cool a kW of IT load. Then you can calculate the total IT equipment cooling flow rate with the following equation:
UPS load (kW) x average CFM/kW = Total IT CFM
The bypass airflow rate can be simply determined by subtracting the total IT equipment cooling flow rate from the total cooling unit flow rate. The total cooling unit flow rate can easily be determined from cooling unit specifications.
- There are eight cooling units that each deliver 12,000 CFM
- Total cooling unit flow rate is 96,000 CFM (8 x 12,000 CFM = 96,000 CFM)
- The average IT equipment ∆T is 25°F
- IT required cooling flow rate is 126 CFM/kW
- UPS load is 325 kW
- Therefore the total IT equipment cooling flow rate is 40,950 CFM (325 kW x 126 CFM/kW = 40,950 CFM)
Bypass airflow rate is 55,050 CFM or 57% (cooling flow rate 96,000 CFM – IT flow rate 40,950 CFM = bypass flow rate 55,050 CFM) (55,050 CFM / 96,000 CFM = 0.57 = 57%)
While these metrics may seem remedial to the experienced data center operator, they very often reveal opportunities for further improvement that have been overlooked in pursuit of the newest trends and technology. Starting with fundamental steps, like improving the AFM in the room by simply properly placing every perforated tile, can improve conditions to the point where significant energy savings can be realized without the need to invest in new equipment. Cooling unit fan speeds can be reduced and supply air temperatures increased to levels previously thought impossible, all without impacting the IT equipment intake temperatures. By employing these basic metrics at the outset of an efficiency evaluation or AFM upgrade, data center operators can begin to make accurate and necessary changes to their sites and improve the total cooling capacity of the room, often improving the reliability of the equipment and saving energy costs.