The World Economic Forum estimates that by 2025, 463 exabytes of data will be created each day. That’s the equivalent of more than 212 million DVDs per day. To deal with the ever-increasing torrent, data centers are looking for new ways to address four key objectives: maximizing uptime, optimizing energy usage, detecting potential risks, and defending against cyberattacks. Making use of machine learning (ML) technology is high on the list of potential solutions.
ML and AI might seem like a no-brainer, yet technology executives have their reasons for proceeding with caution. These include uncertainty around return on investment (ROI), complicated policies around data sharing, and a lack of awareness and support from upper management. However, given how reliant businesses are on data, tech leaders can ill afford to overlook the importance of ML and other AI applications, especially when it comes to maintaining uptime.
Paying Dearly For Downtime
The cost of an unplanned outage for a data center can range widely — from around $140,000 to $540,000 per hour — depending on the size of the enterprise and the industry. Back in 2017, when British Airways suffered a major data center failure, the airline had to swallow a record-breaking loss of more than $75 million. Thanks to advances in ML and smarter infrastructure, data centers today are able to greatly streamline operations for uptime.
By 2022, more than 50% of technology in data centers could run autonomously using embedded AI and ML functionality, according to market research firm International Data Corp. Here are four ways ML can be applied to strengthen data centers.
- Maximize energy efficiency
Data centers account for 1% of global energy usage. This may sound like a small number, but even a modest increase in efficiency would yield significant cost savings and stop millions of tons of carbon from pouring into the atmosphere. The good news is that energy management is one of the easiest areas to implement ML. Google’s use of DeepMind, for example, has constantly delivered energy savings of around 30%, vastly reducing overheads. The same can be true for data centers.
- Accurate capacity planning
To meet the ever-increasing volume of complex workloads, data center administrators must have accurate forecasts of resource needs well ahead of time. These forecasts need to be updated in real time, reflecting any changes in environmental conditions. Predictive models built using advanced ML algorithms can churn petabytes of data and intelligently project capacity and performance utilization. This planning helps data centers avoid any resource shortage that could result in downtime and impact operations.
- Faster risk analysis
ML can be trained to detect anomalies accurately and faster than a human. At best, an engineer might take hours to spot something and, at worst, miss the anomaly altogether. For example, several data-center-management-as-a-service (DMaaS) programs can analyze performance data from critical data center equipment, such as power management and cooling systems, and predict when they might fail. By notifying facility managers of the impending failure ahead of time, ML technology can keep downtime to a minimum.
- Resilience against attacks
Defending against distributed denial-of-service (DDoS) attacks requires fast detection with a low false-positive rate. These detection methods are broadly classified into two types: signature-based and anomaly-based. Signature-based detection has known characteristics in general traffic and is widely implemented and used. Anomaly-based detections are outside of normal traffic patterns. ML regression models can be used to identify the type of traffic anomalies, helping to minimize false alarms.
Overcoming Challenges
Several data centers are running AI and ML pilots, but some have struggled with full-scale rollouts. That is because a pilot would use smaller data sets and operate within lab conditions. In the real world, for example, there could be two or three terabytes of data that need to be processed in a matter of minutes. As such, scaling AI from the lab to the field is a major challenge that data centers must overcome. Other challenges include the difficulty of accessing quality data to train models, long implementation timelines to achieve accuracy, and compliance with the complex legal policies on data sharing.
So, how can data centers surmount these challenges? There is no one-size-fits-all solution. At the risk of stating the obvious, start with an AI road map. It may seem surprising, but many businesses neglect to take this initial step. Identity specific needs and the potential ROI. Create a comprehensive data strategy that focuses on data availability and acquisition as well as accurate labeling of data.
Next, use ML models with enterprise-grade performance, so the ML is easy to scale. Use data center infrastructure for algorithm training that is automated and containerized. Again, this becomes easy to scale. Focus on data quality and set up test centers of excellence or put in place similar structures for AI pilots. This needs to consider the relevant technology skills, expertise, and capabilities within the organization. Helping to scale pilots into broader applications will deliver greater impact.
To stay in the game, data centers need to rearchitect how they operate in a landscape that is constantly evolving. In today’s interconnected, always-on society, data centers will need to keep pushing the boundaries with ML to avoid becoming outmoded, outflanked, and overwhelmed.