Extreme weather events are wreaking havoc across the country, and they are only set to become more frequent and unpredictable in the coming years. While this obviously has wider societal and ecological implications, the impact of these weather events on data centers can be pronounced. Extreme weather can quickly put the power supply at risk by forcing conditions to shift outside of normal operating range, causing downtime that operators cannot afford. In fact, an Uptime Institute Survey found that nearly a third of all reported outages cost their victims more than $250,000, with many exceeding $1 million.
Heatwaves themselves are becoming an area of concern. Many data center professionals haven’t had to prepare for extreme heat before, especially those hailing from the northern U.S. This has made data center cooling a top priority for operators seeking to maintain optimum environments amidst fluctuating temperatures. As outside temperatures rise, the strain to keep data center temperatures in range increases exponentially. This dramatically increases the risk of downtime. During extreme weather events like heatwaves, this strain could even be extended over several days, putting facilities at major risk for downtime.
Fortunately, there are ways operators can prepare for and mitigate the consequences of these unforeseen outages. Namely, employing predictive analytics and data lakes, also known as industry shared data repository (ISDR), as part of an overall data center management strategy.
The Role of Data Lakes
Employing a data lake — a storage repository filled with anonymous data points from companies spread throughout the world — and leveraging the right analytics can help predict and avoid downtime, even in the face of unpredictable weather events.
This is due to the nature of data lakes. The more data they have access to, the better the predictive analytics. Richer data allows analytic software to learn from a larger number of situations that have caused outages. In some cases, this can be millions of data points. Even if downtime is caused by a never-before-seen incident — like an extreme weather event — it could be avoided because the predictive analytics solution recognizes the potential problem, thanks to data from a different company that has experienced something similar.
With extreme weather being so unpredictable this becomes especially relevant. The winter storm that hit Texas earlier this year caused widespread power outages — it was unlike anything data center operators had ever seen there. However, today’s software provides the potential to mitigate these circumstances. For instance, if local professionals had access to a data lake that included insights from locations in northern parts of the country that regularly experience cold weather, there may have been options to prevent some of the downtime experienced.
That doesn’t mean that a strategy should just be housing all data. That may turn your data lake into a swamp. it’s about having the right data paired with the latest analytic software tools to turn data into actionable intelligence.
Once an issue is identified, predictive analytics systems relay findings to a DCIM system that can then push notifications to data center staff via their mobile devices to alert them of a potential outage risk. Staff can protect vulnerable equipment to prevent some or all of the potential downtime.
Leveraging a data lake, machines can keep record of past asset and data center failures and use predictive analytics to learn and avoid similar situations in the future. Predictive analytics monitor data from data centers and uses AI to recognize patterns that lead to equipment failure or downtime. This data isn’t limited to a single data center, and the analytics software can glean insights from sister facilities and data lakes.
Predictive analytics can help data center operators uncover and avoid potential risks. One way is by relating battery failures to patterns in data on voltages, expected operating temperatures, and other variables. Others include leveraging data points about fan speeds, temperatures, and pressures or ingesting data from thermography cameras that sense the temperatures of switchboard connections. Ultimately, a systemwide analytics approach allows for an understanding of the interdependencies of assets in a system, which will lead to better predictability in future
To truly understand the role analytics plays in preventing downtime caused by extreme weather, it’s helpful to look at several examples of this technology in action.
Standard methods of procedure (MOPS) for data centers require generator testing and transfer of load to generate on a periodic basis. During this exercise, the analytics system is monitoring for and measuring data points that will indicate normal versus anomalous behavior. During a critical weather incident, the analytics program will be able to know if the system is in a normal state and ready to transfer. If it deems it’s not, the system can indicate to operators the existence of anomalous conditions that require intervention to ensure uptime is maintained.
There are also numerous occasions where data center components could be stressed and cause a failure or outage event, including the change of state of a system, starting a generator, transfer to bypass for a UPS, or coming out of standby for a cooling system. By continuously monitoring and benchmarking the performance criteria of these systems, the analytics system can give operators a higher level of confidence that these systems will perform during a critical incident. This is due to the fact they have benchmarked the normal window of operation and are prepared to identify anomalous conditions. For example, when a generator battery is under start conditions, the analytics will understand normal voltage and current changes as the generator powers on. If it begins to see a decline in voltage or current, it can flag and preempt a weakening battery long before the unit is needed in a critical situation.
Trends and correlations of data points within and across the data lakes may even be able to identify unforeseen dependencies between systems that create additional wear or stress on components that otherwise would not be recognizable. When a heatwave hits, higher than normal temperatures could result in condenser and evaporator fan speeds ramping up. Over time, the additional vibration could cause a premature loosening of torqued connections in the UPS or switchgear. The combined loosening and vibration create a hot spot with the equipment that results in a premature failure. This would be almost impossible to measure in an ad-hoc manner, but with access to a continuous data stream going into a data lake, combined with machine learning and artificial intelligence tools leveraged to analyze that data, the issue may present itself.
Analytics are also essential for the energy management and decarbonization of mission critical environments. In the future, it will be essential for us to benchmark and drive down our carbon footprint, and the ability to do this successfully will rely on solid analytics.
Preparing For the Worst
Unfortunately, extreme weather events are only predicted to increase in frequency in the coming years. Data center operators have a choice — leave it to chance and hopefully avoid costly downtime if a weather event hits or get the right tools in place now to mitigate against an ever-increasing risk. Those who want to future-proof operations and save on costs long term will embrace data lakes and systemwide predictive analytics as the best path forward to contend with the changing climate.