The Inefficient Pursuit Of Data Center Energy Efficiency
Thirty years ago, when I first started working in data centers, energy efficiency wasn’t a concern. It wasn’t even considered other than to ensure systems were sized correctly to meet the load. IT pursued improved performance and facilities pursued increased reliability. But as the costs of energy went up and the loads increased, the spotlight began to swing towards energy consumption and, more importantly, to the “bottom line,” AKA the electric bill.
Around 15 years ago, I asked an IT equipment manufacturer why each new generation of computers had to include increases in heat density and overall watts per square foot. I explained how this was causing difficult challenges to critical facilities operations such as faster and more extreme thermal transients following a cooling outage. His response was a true eye-opener for me. He said those who select and purchase computer equipment are not responsible for paying the electric bill and they are not responsible for cooling the equipment. All they care about is better IT performance. The IT department’s definition of “improved performance” was defined by clock-speed, processor speed, through-put, increased memory, etc. No customer ever purchased a server because it used less energy than their competition’s, so why invest in making their products more efficient?
Even though the facilities departments of most critical operations paid the electric bill, they still held a similar mindset. Reliability trumped energy efficiency every time. Infrastructure was intentionally over-sized. Data halls and computer rooms were as cold as meat lockers. The “optimum” chiller was the one that was the hardest to knock off-line and then quickest to restart. If all other things were equal, maybe the more energy efficient chiller would get selected, but how often are all other things equal? Many data centers refused to utilize existing waterside economizers due to the perceived difficulties (real or imagined) regarding successfully transitioning from free-cooling back to mechanical cooling. The electric bill was just another utility like the water bill, sewer bill, fuel oil bill, etc.
It was maybe 10 years ago when the CFOs began to realize the data center electric bills were becoming on par and even surpassing the costs of IT equipment. And in fairly short order the industry became energy conscious. IT equipment manufacturers started “right-sizing” power supplies and seeking Energy Star® certification. HVAC manufacturers started adding “energy recovery” and “economizers” to their product lines. Even the electric power and distribution industry started pursuing energy efficiency improvements like designing transformers, UPS modules, etc., to operate more efficiently at part load conditions (where they operate +99.9% of the time). Engineering and design firms started touting strategies, designs, topologies, etc., that were not only reliable, but also energy efficient. Facilities started measuring their energy use and efficiency and the term PUE (power usage effectiveness) was coined. The industry-wide pursuit of efficiency was on!
At first it was easy to find opportunities to improve energy efficiency. There was low-hanging fruit everywhere. Most data centers had computer room air handlers (CRAHs) with constant speed fans, and return air temperature control, and were routinely observed to be “fighting” each other (where units in close proximity were simultaneously heating and cooling and/or humidifying and dehumidifying). Raised floor systems and especially cable penetrations were unsealed. Partially filled racks lacked blanking panels. Lights were left on 24x7.
Today it is safe to say most of the low and even middle-hanging fruit has been picked. Integrated waterside economizers were designed to be capable of working in series with chillers that mitigated most of the challenges associated with transitioning from free-cooling back to mechanical cooling. Rotating equipment was purchased or retrofitted with variable speed motors. Hot/cold aisle containment and other airflow management strategies are now common. The resulting PUE values have dropped to levels deemed impossible just a few years ago. Where in the not so distant past we sought out products and designs promising significant improvements in efficiency, we are now pursuing the remaining smaller opportunities.
This continuing pursuit for the most efficient product and facility has in some (maybe many) instances resulted in increased complexities that add serious risk to overall reliability, and ironically even to being able to “optimize” operations to achieve the promised efficiencies.
An example comes to mind. I recently encountered a data center that deployed computer room air conditioners (CRACs) with integral free-cooling economizers. These products were available decades ago but were typically advertised as “dual-source cooling units” that had both chilled water coils and compressors so if the chilled water plant failed, they would switch over to mechanical cooling/compressors as a backup. The old units were slightly more complicated than a simple chilled water unit (CRAH) or compressorized unit (CRAC), but still they were simple. The new version has well over a hundred different settings and “configurations,” multiple levels of menus, and myriad available options and ancillaries.
Not long after these units were placed in operation the site experienced a chilled water outage. The new highly efficient CRACs failed to transfer to compressorized cooling and the data center over heated. During the ensuing troubleshooting and forensic investigation, it became evident that the units didn’t actually fail, they were setup and configured incorrectly for the site specific conditions, which resulted in their not transitioning to mechanical cooling. Furthermore, no one actually understood what all the setpoints and configurable settings did. This included the site’s seasoned facilities operations staff, the project engineering firm, the commissioning agents, the installing contractor, the control’s contractor, the manufacturer’s field technicians, and in some instances, the manufacturer’s application engineer. The owner’s operations and maintenance manual was unclear and in several instances incorrect. Many of the new capabilities that were intended to achieve the promised energy efficiency improvements were being defeated due to improper setup and programming of setpoints, ranges, options, etc. Trend reports showed at least one unit had been unnecessarily dehumidifying since startup due to a misunderstood control parameter. Even after exhaustive and comprehensive studying of the submittals and various technical manuals, questions remained unanswered.
And suddenly we were back where we started. Everyone was reminded that reliability still trumps energy efficiency in mission critical facilities. Any savings in energy costs were lost due to this one anomaly. The entire focus of the owner, the engineers, the contractors, and even the manufacturer was to restore reliability and confidence in the units that they would not fail to cool again. Optimizing the units for energy conservation was no longer discussed.
You can see a simple analogy the next time you shop for a washing machine. The basic washing machine has been around for decades and basically does the same chore. Fifty years ago a typical washing machine would have two selector knobs and maybe an on/off button (if not incorporated into one of the knobs). You could adjust water temperature and maybe water level. Most people set it to operate at one set of “normal” conditions (warm water wash, cold water rinse, and medium water level or such) and left it that way. They lasted 15 or 20 years, were easily troubleshot and repaired, and did a good job of washing clothes.
My previous washing machine had several knobs, lights, and buttons. It lasted about six years and required a trained “technician” to troubleshoot and replace a printed circuit board that had to be ordered, after which the auto-self-balancing device failed and totaled the unit. My washing machine today has five knobs (with a combined total of 23 settings), six indicating lights, an LCD screen, and a pushbutton. The first time the new washing machine was used it was set for “normal” conditions: warm water wash, cold water rinse, and medium water level. It’s remained there ever since.
The continuing pursuit of energy efficiency improvements is and should remain an inherent goal for most if not all facilities. This is even truer for data centers where even very small improvements in efficiencies can result in large energy savings due to the enormous IT loads coupled with 7x24xForever operations. But for these improvements to be realized requires products, designs, and infrastructure that can be consistently deployed as intended, understood by reasonably experienced and trained staff, and does not induce operational risk due to overly complex solutions. They need to be intuitive, user-friendly, and fail-safe.