If you do an internet search for liquid cooling, you will find several articles expounding the benefits of cooling your data center with this “new” technology. It’s reasonable to wonder, “Is liquid cooling a fad or an opportunity?” This article will define liquid cooling and help you understand whether or not liquid cooling is a good option for your data center.
What is liquid cooling?
The best way to explain liquid cooling is to contrast it with air cooling.
Air-cooled data centers
The mission of a cooling system is to get heat generated by hot computer chips out of the data center. Figure 1 shows the typical cooling process for an air-cooled data center. The electric triangles show the percentage of cooling energy required for each part of the cooling process.
Figure 2 shows how air flows through the data center and HVAC system. The data center air is drawn through the servers with server fans. This slightly warmer air is then cooled off by computer room air conditioning units blowing air into pressurized server room floors. Chilled water is pumped from the chiller at 45°F to the air handling coils and is warmed to 55°. The chiller cools the 55° water to 45°. The chiller must then be cooled off by cooling tower water entering at 85° and leaving the chiller at 95°. This 95° hot water is pumped through a cooling tower that finally rejects the data center heat to the outside.
Liquid-cooled data center
The liquid-cooled data center cuts out the highest energy-consuming devices in the air-cooled system. Since the computer chips are operating at such high temperatures, these chips can be directly cooled by water. The server fans are not required in a direct chip cooling system. No data center fans are required to move large amounts of air. No chiller is required.
The largest energy consumers in the traditional cooling loop are the fans blowing air into data center floors and the chiller that provides those fans with chilled water. With liquid cooling, these big energy users are being eliminated.
The case for energy efficiency
The greatest motivator for liquid cooling in data centers is energy efficiency. Energy efficiency has been a focus of data center owners ever since the concept of PUE was introduced. PUE is the measure of total facility power consumed divided by the power consumed by IT equipment. While energy-efficient data centers brag about having PUEs of 1.2 or less, the average PUE is closer to 1.8.
Based on the direct comparison of liquid cooling to air cooling, liquid cooling uses just 20% of the energy of air-cooled data centers. Since cooling energy is 80% of the non-IT energy required for a data center, a conversion to liquid cooling will turn a 1.80 PUE into a 1.3.
The case for power density
Power density is another motivation to convert to liquid cooling.
Each year, semiconductor technology is becoming smaller and faster. As chip manufacturers make faster, higher-capacity central processing units (CPU’s) and high-end graphic processing units (GPUs), the wattage required also increases (200 W per chip in 2015 versus 500 W per chip in 2022). In the next decade, wattage consumption is expected to reach 1 kW per CPU chip. Each server contains five to eight of these chips, and each rack holds 10 to 20 servers. This means that power consumption of each server rack is rising from 5 kW to 80 kW. The good news is that data centers can pack a lot more computing and storage power into a very small space. The bad news is that rejecting the massive amount of heat thrown off by these servers is becoming more difficult using air cooling. Why? Because air cooling is not effective enough to remove that much heat from such a small space.
The thermodynamic properties of air limit its ability to transfer large amounts of heat. The ability of a substance to transfer heat is measured by its thermal conductivity. Air has a thermal conductivity of 0.01580 Btu/h ft °F. Water has a thermal conductivity of 0.3632 Btu/h ft °F. This means that water will conduct heat 23 times more efficiently than air (23 = 0.3632/0.01580).
Picture a computer chip in the depths of a server rack getting hotter and hotter. For a relatable example, imagine you kept a 40-W incandescent lightbulb lit for a few hours and then touched it — you would get burned. Now, imagine that same scenario but with a 500-W lightbulb. That’s what a high-performance CPU will feel like. If you put that hot lightbulb in the middle of a small, steel box, blowing air across it would only remove a portion of the heat, and it would take a while. If you tried to move that much heat in a data center full of high-performance server racks, it would feel like a supersonic wind tunnel. Even then, you couldn’t remove enough heat to keep the servers operational.
Liquid cooling offers a way to increase heat rejection by 23 times because of the thermal conductivity of water versus air. It also allows cooling systems to extract heat as close to the generation source as possible.
Liquid cooling options
Not every data center will be housing high-performance servers or have a need for high-density power usage. Nevertheless, the efficiency gains associated with liquid cooling are still available. Whether you retrofit your existing servers or wait until you replace servers, there are multiple options to successfully implement liquid cooling in your data center.
Air cooling (3 kW to 10 kW per rack)
Air cooling is effective at cooling data centers with low power densities. Liquid cooling can be used for improving energy efficiency, but it’s not needed for server operation.
Rear door heat exchanger (10 kW to 25 kW per rack)
This option includes installing a panel on the back of the server rack with a chilled water heat exchanger. Server fans draw air across this cold radiator to blow cold air across the electronic components in the servers and out into the data center space. While this option is considered liquid cooling, it still involves a substantial amount of fan energy and lower water temperatures. Both factors diminish much of the energy efficiency improvement you may gain by other liquid cooling options.
Direct-to-chip cooling (25 kW to 50 kW per rack)
The highest heat-producing chips in the server are the CPU and GPU. The next highest heat-producing devices are storage media, like mechanical and solid-state hard drives. Liquid-cooled heat exchangers are mounted directly to chips in lieu of air-cooled fins. This method of liquid cooling has become the most popular with video gamers. This technology also increases the speed and overall performance of these critical computer chips because they are running much cooler than possible in an air-cooled environment. While this method extracts heat from the highest energy-using devices in the server, there are other electronic components that still need to be cooled. This means that a small amount air cooling is required for the general data center space.
Immersion cooling (Over 50 kW per rack)
Imagine a horizontal vat of nonconductive liquid. Then, put the entire server into this vat. Since the liquid is nonconductive, the risk of electronic failure or electronic shorts is less likely than server components in air. The advantage of this system is that all the electronic circuitry is being cooled by liquid. Liquid is heated in this vat and then cooled by a heat exchanger pumping water to a free-cooling heat exchanger to get the full advantage of liquid cooling. Because server racks are horizontal vats; it’s more difficult to pull servers in and out because you must lift a heavy server up and out.
A data center owner may decide to convert to any of these liquid cooling options with lower power density servers to reduce energy costs. However, when power density is increased, data center operators must consider the best liquid cooling option for the load they decide to accommodate.
What could go wrong?
If liquid cooling is so energy efficient and modern server technology is forcing this upgrade, why aren’t all data centers using liquid cooling? Each version of liquid cooling described above comes with unique challenges. Here are a few to consider.
Since liquid cooling systems have the potential to leak, and IT and other electrical equipment fails when wet, data center owners/operators are reluctant to take the risk.
The best way to manage leaks is to create a response plan. What will you do if you must take a complete rack offline? How will you detect leaks as soon as they happen? How will you isolate leaks to ensure the rest of the hydronic system is operational? How do you protect computer equipment in the event of a leak? What type of liquid will be the least damaging to computer hardware? Should you use dielectric liquid to avoid damage to electronic components?
Any hydronic system must be balanced to avoid hot spots. In an air-distribution system, airflow must make it to the areas that need cooling and be blocked from the areas that don’t need cooling. Liquid cooling is no different.
Rear door heat exchangers have heat exchangers with fans. Each heat exchanger has liquid flowing through it at a certain rate. If that rate is lowered, then that specific section will not get the cooling that it needs. Likewise, if flow is too high, that heat exchanger will be robbing flow from other servers that need to have higher flow.
In the case of direct-to-chip cooling, each server will have up to six chip heat exchangers with six supply tubes and six return tubes that must be balanced. A server rack may have up to 200 mini water circuits that must be balanced to avoid leaving some chips to overheat with little flow.
In the case of immersion cooling, flow must be present in the entire tank to avoid development of hot spots.
Data centers must provide redundancy for servers to avoid downtime. In an air-cooled data center, if a computer room air conditioner (CRAC) fails, there are other CRACs that can provide sufficient cooling to the data center plenum. However, if a liquid cooling system collocated with server racks fails, that server rack will be down until the cooling is repaired. Because liquid cooling systems are dedicated to server racks, individual servers, or individual computer chips, it’s challenging to provide redundancy through common plenum systems.
Rear door heat exchangers attempt to use chilled water in their heat exchangers to remove heat from server racks. Chilled water temperatures range from 40° to 65°. The chilled water temperature for a rear door heat exchanger must be maintained above the dew point of the data center. If the chilled water temperature drops below this dew point, water will condense, creating a dangerous moisture problem in the facility.
Condensation should not be a problem with chip cooling as long as water temperatures are well above the dew point in the data center.
Any of these challenges can be overcome with planning, but they should be taken into consideration for any data center operator considering implementing a liquid cooling strategy.
Pros and cons of liquid cooling
Like any new technology, liquid cooling has pros and cons. A few are listed below. As you can imagine, each data center must objectively review pros and cons and make the best decision for their data center.
- Energy efficiency — By eliminating the highest energy using cooling equipment, cooling energy will be reduced by 80%.
- Eliminate cooling equipment — CRACs and chillers are not needed in a liquid cooling system. By eliminating these pieces of equipment, capital budgets can be reduced, leaving more funds to invest in other priorities. This reduction of equipment also allows for smaller data center facilities. By eliminating refrigerant compressors, maintenance costs will be reduced.
- High-performance computing — New CPUs and GPUs will need liquid cooling. Air-cooling cannot reject enough heat for this new technology. Even older technology that doesn’t need liquid cooling will run much better with liquid cooling. The cooler a chip is, the more calculations it can complete. Liquid cooled chips process 25% more information than air-cooled chips.
- High power density — With the increasing price of real estate, data centers that are able to handle high-density server racks will be more profitable when they can accommodate more computing power with less square footage.
- New technology — No one wants to experiment with new systems in the critical data center environment. While many data center operators are familiar with chillers and CRACs, they are not familiar with liquid cooling systems.
- Balancing water flow to chips — Air-cooled systems will develop hot spots if air distribution is not controlled properly. Liquid cooling is no different. If direct chip coolers are connected to a large manifold, there’s no guarantee that the flow to each chip will be properly balanced. Such balancing is possible but is one more complication of liquid cooling.
- Sensitive water treatment — Water quality is a much more critical aspect of liquid cooling than traditional chilled water systems. Running water through microchip heat exchangers can easily create blockages. Cooling loop water must be of the highest quality. If immersion cooling is used, the nonconductive liquid used for this process must be handled with care.
- Risk of leaks in costly it equipment — The greatest fear is having water sprayed inside a rack ruining a million dollars’ worth of server equipment.
- Redundancy is a challenge — Redundancy of cooling varies based on the liquid cooling option chosen but is often less effective with liquid cooling than with air cooling.
It’s estimated that data centers consumed close to 400 Terawatt hours in 2020, costing close to $34 billion. Data center electricity usage is increasing as the use of cloud-based storage, artificial intelligence, and other computer technologies advance. Most data centers are air-cooled. If they converted to liquid cooling, $14 billion per year in electricity costs and 68.4 millions tons of carbon emissions could be avoided.
What is best for your data center?
The decision to utilize liquid cooling for your data center should be based on multiple factors, including electricity cost, power density, environmental objectives, maintenance staff, colocation versus corporate ownership, uptime requirements, local climate, utility and government incentives, retrofit versus new construction, capital expenditure budgets, etc. Choosing the right cooling technology requires weighing many variables and evaluating competing options.