“Data center downtime has become unacceptable to almost every business and yet most downtime is preventable,” said Peter Panfil, vice president and general manager, Emerson Network Power’s AC Power business in North America. “By implementing simple and cost-effective best practices, data center managers can reduce or eliminate the risk of these root causes while simultaneously reducing stranded capacity and improving energy efficiency, flexibility, total cost of ownership, and end-user satisfaction.”
The top causes of data center outages, as identified by the Ponemon Institute survey involved data center power systems, thermal issues and human error. The following best practices can help organizations avoid outages resulting from these common root causes.
- Implement battery monitoring and maintenance. According to the Ponemon Institute report, battery failure is the leading cause of unplanned downtime events. Comprehensive monitoring evaluates battery health and allows data center professionals to anticipate--and prevent--problems like battery expirations. Monthly preventive maintenance tactics, including visual inspections (internal and external), acceptance testing and load testing, can help ensure components are serviced and/or replaced before they pose a risk to continuity.
- Ensure appropriate UPS capacity. More than half of the data center professionals surveyed said their data centers had experienced downtime events as a result of exceeding UPS capacity. Measuring output multiple times per day via an integrated monitoring and management solution can help gauge the typical power draw of IT equipment over time. Establishing an appropriate UPS architecture can enable data center professionals to increase the capacity of their backup power system and eliminate single points of failure.
- Choose the correct UPS. Forty-nine percent of data center professionals reported a UPS equipment failure within the past two years. Implementing an online double conversion UPS system, as opposed to a line-interactive system, enables the battery to be dedicated to the load and eliminates the need for power transfer if the primary utility fails. Additionally, deploying integrated UPS systems--including fans, power supplies and communications cards--enhances reliability, enabling the UPS to maintain availability between service visits even in the event of an internal component failure.
- Invest in the right components. Downstream from the UPS, circuit breaker and power distribution unit (PDU) failures also can impact IT equipment availability. Rack-based PDUs or PDUs with integrated branch circuit monitoring capabilities allow data center professionals to make precise capacity management decisions based on holistic data across interdependent systems, reducing the likelihood of equipment overload failure downstream. Installing a static transfer switch upstream from the UPS assures IT equipment will be powered in the event of bus failure, maintaining the availability of critical IT equipment.
- Weigh cooling options carefully. Cooling-related failures were cited as a root cause of at least one outage by more than a third of data center operators, with water incursions and heat-related computer room air conditioner (CRAC) failures cited as the leading causes of cooling-related downtime. Adopting a cold-aisle containment strategy increases the effectiveness of the CRAC system and ensures that cooling capacity is utilized as efficiently as possible. Using a refrigerant-based row-based cooling solution, instead of a water-based system, minimizes the risk of catastrophic system failures in the event of a cooling fluid leak.
- Make the data center accident-proof. More than half of all data center professionals responding to the Ponemon survey reported at least one outage as a direct result of accidental shutdown or user errors within the past 24 months. Shielding emergency OFF buttons, accurately labeling components and implementing secure access rules can all minimize the potential for catastrophic errors and accidents.