The hot and humid days of summer are upon us again, stress testing the limits of our cooling systems. And while many organizations have converted to colos and cloud computing, there are still many small- to mid-sized data centers and “server rooms” in operation. I have written about this before, but many firms still see their data center cooling systems pushed to, and sometimes beyond, their limits. So you may want to consider using some of these cooling tips to keep your servers from overheating.
There are many posts about ASHRAE’s expanded thermal guidelines and free cooling, but that doesn’t really help if you are in a site with marginal cooling units. This is also a common issue for server rooms located in mixed-use buildings that are not using large dedicated cooling systems or systems with enough extra capacity for those very hot summer days. Virtually any cooling system’s performance will decrease with higher outdoor temperatures and humidity. Many IT departments are “sweating” out the summer (again), hoping that they will not have servers suddenly crashing from temperature shutdowns.
These tricks and techniques are not intended to solve the long-term problem, but they may just help enough to get you through the summer. Many times, when the actual capacity of the cooling system is not severely exceeded by the actual heat load of the equipment, optimizing the airflow may improve the situation until a new or additional cooling system is installed.
- If it feels warm, don’t panic — even if you see 80°F in the cold aisle! Yes, this is hotter than the proverbial 70° to 72° data center “standard” you’re used to (and you may not enjoy working in the room), but it’s not as bad for the servers as you think. If the highest temperature reading in the front of the rack is 80° or less, you are still within ASHRAE’s TC 9.9 “recommended” guidelines. Even if the intake temperature creeps up to 90°, it is still within the A1 “allowable” guidelines.
- Take temperature measurements at the front of the servers. This is where the servers draw in the cool air and is the most important measurement. Take readings at the top, middle, and bottom of the front of the racks (assuming that you have a hot aisle/cold aisle layout). Try to rearrange the servers near the bottom (or coolest area) of the racks. Make sure that you use blanking panels to block off any and all open spaces in the front of the racks. This will prevent hot air from the rear recirculating into the front of the racks.
- Don’t worry about rear temperatures — even if they are at 100° or more (this is not unusual)! Do not place random fans at the backs of racks to “cool them down” — this just causes more mixing of warm air into the cold aisles (I wish I had a dollar for every time I have seen this).
- If you have a raised floor, make sure that the floor grates or perforated tiles are properly located in front of where the hottest racks are. If necessary, rearrange or change floor grates to match the airflow to the heat load. Be careful not to locate floor grates too close to the CRACs, as this will “short circuit” the cool airflow immediately back into the CRACs and rob the rest of the room/row of sufficient cool air.
- Avoid bypass airflow. Check the raised floor for openings inside the cabinets. Cable openings in the floor allow air to escape the raised floor plenum were it is not needed and lowers the available cold air to the floor vents in the cold aisles. Use air containment brush-type collar kits to minimize this problem.
- If possible, redistribute and evenly spread the heat loads into every rack to avoid or minimize “hot spots.” At the very least, manually check the temperature in the racks at the top, middle, and bottom, before you move the servers. Install permanent temperature sensors with central monitoring in each rack or at least every third rack if possible.
- Check the rear of racks for cables blocking exhaust airflow. This will cause excessive back pressure for the IT equipment fans and can result in overheating — even when there is enough cool air in front. This is especially true of racks full of 1U servers with a lot of long power cords and network cabling. Consider purchasing shorter (1 to 2 foot) power cords and network cables to replace the longer cords . Use a cable management system to ensure airflow is not impeded.
- If you have an overhead, ducted cooling system, make sure the cool air outlets are directly over the front of the racks and the return ducts are over the hot aisles. If the ceiling vents and returns are poorly located, the room becomes very hot without exceeding the cooling capacity simply because the cool air is not directed at the front of the racks and the hot air is not properly extracted. The most important issue to avoid is recirculation — make sure the hot air from the rear of the cabinets can get directly back to the CRAC return without mixing with the cold air. If you have a plenum ceiling, consider using it to capture the warm air and add a ducted collar going into the ceiling from your CRAC unit’s top return air intake. Some basic ductwork will have an immediate impact on the room temperature. In fact, the warmer the return air, the higher the efficiency and actual cooling capacity of the CRAC.
- Consider adding temporary roll-in-type cooling units but only if you can exhaust the heat into an external area. Running the exhaust ducts into a ceiling that goes back to the CRAC does not work. The heat exhaust ducts of the roll-in must exhaust into an area outside of the controlled space.
- When the room is not occupied, turn off the lights. This can save 1% to 3% of electrical and heat load, which in a marginal cooling situation, may lower the temperature 1° to 2°.
- Check to see if there is any equipment that is still plugged in and powered up but is no longer in production (aka the ever-popular zombie servers). This is a fairly common occurrence and has an easy fix: Just shut them off!
- If you have blade servers, consider activating the “power capping” feature when cooling systems are not able to handle the full heat load. This may slow down the processors a bit, but it is much better that having an unexpected server crash due to thermal shutdown.
The Bottom Line
Of course, make sure that your cooling system is properly serviced and that all exterior rejection systems have been cleaned. While there is no true quick fix when your heat load totally exceeds your cooling system’s capacity, sometimes just improving the airflow may increase the overall efficiency by 5% to 20%. This may get you though the hottest days until you can upgrade your cooling systems. In any event, it will lower your energy costs, which is always a good thing.
The COVID-19 pandemic has made it more difficult for IT and other support personnel to work on-site, making remote monitoring and control more important than ever. Plan ahead. At the very least, install some basic remote temperature monitoring inside some or all of the cabinets. Set alarm thresholds to provide an early warning system of developing problems. If all else fails, have a fall-back plan to shut down the least critical systems so that the more critical servers can remain operational. Make sure to locate the most critical systems in the coolest areas. A little rearranging is a lot better than getting (or perhaps not getting) high temperature alerts or an unexpected shutdown from an overheating critical system.