Data failure is every business or organization’s worst nightmare. We need only to point to the recent Delta Airlines outage to show how disruptive data center failure can truly be. Hundreds of flights had to be cancelled, entitling customers to costly refunds.

The total cost of a data center outage varies dependent on type of business, but according to a survey from the Ponemon Institute, the average cost of a single outage in 2015 was $740,000. Airlines grounding flights as a result of data center outages are becoming a fairly regular occurrence. In one such incident last year, United Airlines grounded flights for over an hour, attributing the disruption to network connectivity issues. And earlier this year, a power outage at a Verizon data center caused flight delays for customers of JetBlue, which was hosting infrastructure at the facility. To help avoid these expensive and bruising catastrophes, finding smart ways to prevent data failure should be a top priority across the board.

In data centers, a low and balanced power usage effectiveness (PUE) indicates stability, and optimal HVAC performance is central in maintaining PUE’s equilibrium. Together, these two factors are priorities for the data center industry because they can help operators gauge opportunities to increase efficiencies and maintain a healthy PUE. Regularly scheduled check-ups have been the way to keep HVAC systems running smoothly by identifying and correcting issues, but what if you could reduce the likelihood of failure significantly?

Preventive maintenance (PM) was the only method used in the past to reduce the chance of failure — until the predictive maintenance (PdM) model appeared. Predictive technology empowers early fault detection through testing, diagnostics, and machine learning to identify and prevent HVAC system failures and consequently diminish data center downtime. PdM is key in optimizing HVAC, streamlining PUE, and reducing the chances of critical overall system failure.

 

THE UNIQUE NEEDS OF A DATA CENTER

A database server is used to sustain information, performing a myriad of tasks from analysis to storage to archiving. Protecting functioning servers requires ancillary HVAC equipment to keep center environments running at optimum temperatures. It is critical for companies to include HVAC as part of their IT infrastructure. With data centers experiencing on average 2.5 outages per year each lasting approximately 134 minutes, one of the main concerns today is business continuity. Companies rely on their information systems for continuous operations, so finding ways to eliminate and/or reduce data center outages is a must. Regardless of what causes an outage — human error, equipment failure, external power disruption — each has the potential to inflict serious financial damage.

A recent survey by CA Technologies of 200 companies across North America and Europe to determine the cost of downtime incurred from an IT outage showed that $26.5 billion in revenue was lost each year, which comes out to an average $150,000 annual hit per business. These IT outages can be especially costly and disruptive during traffic spike times, so reducing downtime and achieving low latency are critical factors in eliminating breakdowns. Maintaining a stable environment for IT server rooms and data centers is a constant challenge to both businesses and their solutions providers. In order for today’s data centers to operate efficiently, they must have the flexibility to quickly adapt to advances in technology. The economy, as well as new data center hardware, is forcing businesses to implement best practice policies, find cost--effective cooling solutions, and re-evaluate IT server room planning and design.

 

ALTERNATIVE STRATEGIES FOR REDUCING COSTS

In recent years, data centers have become one of the largest consumers of electricity worldwide. In the U.S. alone, data centers consumed an estimated 90 billion kWh of electricity in 2013, which is equal to the total capacity of the country’s top 30 power plants. The average cost of running HVAC in a data center can be incredibly high, forcing companies to find innovative methods to reduce energy costs. In fact, many companies, notably Google and Facebook, have started building new data centers in cold weather climates, such as Norway and Finland, to take advantage of the colder ambient temperatures resulting in reduced electricity costs.

In 2009, Google purchased a 60--year--old paper mill in Hamina, Finland with plans to convert it into a modern data center. In two years, it became fully operational, utilizing seawater from the Bay of Finland in its cooling system. This facility is now one of Google’s most advanced and efficient data centers in the world. In 2013, Facebook followed suit by opening a 30,000-square-meter server farm in Lulea, a small sub-arctic town in northern Sweden. This allows the company to significantly increase efficiency for their data center by using the naturally cold air to chill its servers, as opposed to the high costs of air conditioning a building of that size running hot equipment.

 

OPTIMIZING PUE

Recent technological advances have led to a greater demand for high-end computing equipment, increasing data center’s cooling and power requirements. With a number of best practices, companies still seek to balance PUE to achieve optimal efficiencies.

As discussed earlier, relocating to a cooler climate can result in a significant energy savings. Additionally, choosing greener options for a center’s infrastructure can also reduce overall power consumption.

The design and positioning of server racks in a data center environment have a direct impact on a PUE reading. The hot aisle/cold aisle approach to set-up can enhance PUE without much investment. Modular data centers that segregate high power servers and host them in a different location within the data center, as well as virtualization, can greatly improve PUE. Other best practices include proper insulation of the data center, strategies for balanced data storage, and matching the UPS load to the system load.

 

THE EVOLUTION OF PDM

Also known as condition-based maintenance, PdM first surfaced about 20 years ago in the military sector, eventually being adopted into high-end industries, such as gas and oil, utilities, and aviation. According to the U.S. Department of Energy, past predictive maintenance studies have shown that a program using PdM can result in a savings of 8% to 12% over a program utilizing preventive maintenance alone. Predictive maintenance can reduce energy and maintenance costs by up to 30%, eliminate breakdowns 35% to 45%, and reduce downtime by up to 75%. In addition, optimizing a working machine can cut energy consumption by 20% to 25%. This is due to the fact that when a machine is operating in a non-optimal state, it uses more power or energy to get to the final outcome.

Yet despite the clear benefits of PdM, only 12% of commercial buildings employ this type of technology. One reason for this is that the high cost of implementing PdM had previously made it unaffordable for the lower end market. Besides the upfront infrastructure investment, companies had to maintain an in-house staff properly trained in PdM applications. But with the advent of mobile hardware and cloud-based computing, PdM is now available at a fraction of the cost of systems that were once hardwired and required highly trained technicians to analyze results. This has allowed PdM to trickle down into new markets, such as data centers, helping to streamline and optimize operations.

 

VIBRATION ANALYSIS

There are a number of techniques available today in PdM analysis, and they vary depending on the type of business or industry. Thermal imaging uses infra-red (IR) technology to identify high temperature areas on the surface of equipment, and is primarily used on electrical panels to identify loose contacts or overheating of cables. Oil analysis can detect wear of internal components by testing the chemical properties of a lubricant, and then testing and comparing them against the original specification. Ultrasound technology identifies failing rolling element bearings and over- and under-lubrication conditions by detecting sound pressure waves that act upon a resonant sensor to create a small electrical charge.

One of the more powerful techniques used in PdM is vibration diagnostic technology, which focuses on vibration and sound analysis. This approach can detect and predict over 90 malfunctions, including misalignment, imbalance, electrical malfunction, agitation, breaks, and issues with belt equipment systems. It is built around the concept that every mechanical system can be characterized by the sound that it makes. “Talking” machines make PdM easier, allowing HVAC contractors and regular maintenance staff to record the data and perform diagnostics.

Vibration analysis is used as a tool to determine a machine’s condition and the specific cause and location of problems, increasing uptime and reducing costs. Machine-mounted sensors are critical to vibration monitoring and analysis by detecting root causes of most fault conditions at an early stage. Sensors can track progressing stages of bearing failure, identify imbalance and mechanical wear, and correct misalignment and resonance.

Implementation of a maintenance program based on vibration analysis is key in helping companies achieve maximum efficiency. Applying technology and work processes prevents unexpected downtime, extends machine life, and optimizes performance. The first step in PdM is to identify critical machines and determine needed resources. What effect does a particular machine have on production? Should the support staff be in-house, outsourced, or a combination of both? The second step is to determine collection methods (e.g., route--based periodic or online) and create a database (identifying machine configuration and measurement points).

Next is the collection of data — periodic walkaround survey vs. continuous survey with online monitoring — which is used to detect developing faults. Since each machine fault generates a specific vibration pattern, a single vibration measurement provides information about multiple components. Along with measuring the frequency of the vibration, this results in a diagnosis that indicates the nature of the fault. The final step is to document the results (diagnoses, recommendations, reoccurring faults, productive gains, and cost savings) to determine the maintenance implications and financial impact.

 

CONCLUSION

The unique and intricate infrastructure of a data center warrants effective, proactive maintenance to prevent costly breakdowns. Monitoring for optimal HVAC functionality, focusing on efficient data center design, and employing innovative PdM technology all contributes to an optimized PUE. System failure not only results in financial loss, it can significantly damage a company’s reputation. Implementing PdM delivers industry with a crystal ball of insight to ensure future performance and prevent critical failure.