For a business, downtime can mean many things, especially in today's economic environment. Slow traffic, sluggish sales, and yes, even a complete halt of business operations due to a malfunction in the data center are things something no business can afford.
Data center downtime differs from downtime in other business areas because in most cases it can be averted through a preventive maintenance (PM) program, that in turn, ensures business continuity. Minimal or no PM greatly increases the chance that business operations will be disrupted if the power equipment fails, thus exposing a business to potential loss of revenue, reduced work productivity, and lowered customer satisfaction and loyalty, etc. That's not to mention the costs incurred for repairs and replacements, if applicable.
The Case for PMThe current economic situation has many companies tightening budgets for new equipment. As a result, data center managers should put greater focus on maintaining their existing assets. Taking this approach can greatly reduce the need to repair or replace important components.
Also, PM ensures business continuity. As organizations become increasingly dependent on data center systems, the need for great reliability increases in the critical power system. For many organizations, the IT infrastructure has evolved into an interdependent business-critical network that includes data, applications, storage, servers, and networking. A power failure at any point along the network can impact the entire operation, with serious consequences for the business.
Benefits of a PM ProgramEnd users can minimize unit-related failures by instituting a comprehensive PM program implemented by original equipment manufacture (OEM) trained and certified technicians. When equipment is not maintained, especially in adverse conditions, such as dirty environments and/or high temperatures, it can deteriorate and cause load loss.
PM programs maximize the reliability and performance of the uninterruptible power supply (UPS) systems on which organizations depend on to keep critical systems running. When correctly implemented, PM visits ensure maximum reliability of data center equipment by providing systematic inspections, detection and correction of incipient failures, either before they occur or before they develop into major defects that could translate into costly downtime. Typical PM programs include inspections, tests, measurements, adjustments, parts replacement, and housekeeping practices.
PM has a number of benefits for the end user. First, better reliability is delivered by adding another layer of redundancy. This is achieved by combining leading service with cutting-edge equipment.
Other benefits include extending the product lifecycle and optimizing capital expenditures for the equipment. In addition, risk management provided at a fixed cost aids in budget preparation and promotes fiscal responsibility as well as provides better control of your business environment.
Begin with the UPSTo keep running through power outages, utility spikes, and other unforeseeable power issues, critical systems depend on the reliability of the UPS system. Therefore, keeping these systems in working condition is crucial.
While UPS systems are designed to offer the utmost reliability and performance at an affordable price, they are not failure proof. Application, installation, design, real-world operating conditions, and maintenance practices can affect the reliability and performance of the UPS systems.
The reliability of a system is only as long as the shortest component life in the unit. However, some manufacturers, including Liebert, address this issue by reducing the number of parts that need to be replaced, thus decreasing the chance of a failure. However, the reality is failures still occur; therefore being proactive with maintenance can greatly reduce exposure to downtime.
Frequency of PMThe frequency of PM visits depends on the type of UPS being utilized in the organization. Small UPS devices should be inspected annually to ensure alarms, filtering, and internal batteries are all operating within specifications. For medium and large systems, which most likely include ancillary equipment, vendors often recommend that inspection and maintenance take place at least twice a year to ensure proper function and confirmation that the system is operating within the manufacturer's specifications.
Typical tasks performed during a semi-annual service visit include:
- Check all breakers.
- Check temperature, connections, and associated controls. Repair and/or report all high temperature areas.
- Complete visual inspection of the equipment including subassemblies, wiring harnesses, contacts, cables, and major components.
- Check air filters for cleanliness.
- Check module(s) completely for rectifier and inverter snubber boards for discoloration.
- Check power capacitors for swelling or leaking oil and dc capacitor vent caps that have extruded more than 1/8 in.
- Record all voltage and current meter readings on the module control cabinet or the system control cabinet.
- Measure and record harmonic trap filter currents.
- Check inverter and rectifier snubbers for burned or broken wires.
- Ensure all nuts, bolts, screws, and connectors for tightness and heat discoloration.
- Verify fuses on the dc capacitor deck for continuity (if applicable).
- With customer approval, perform operational test of the system including unit transfer and battery discharge.
- Check and record all electronics and bring to system specifications as needed.
- Install or perform any engineering field change notices (FCN) as needed.
- Measure and record all low-voltage power supply levels.
- Measure and record phase-to-phase input voltage and currents.
- Review system performance with customer to address any questions and to schedule repairs.
The BatteriesBattery maintenance begins with installation. Batteries must be fully charged, battery room conditions verified, and baseline readings set for proper trend analysis throughout the life of the battery. If this information is not properly gathered and documented, finding bad batteries could prove to be difficult.
For best practices for battery maintenance, refer to the manufacturer's recommendations, the IEEE-1188 for Valve Regulated Lead Acid (VRLA) batteries and the IEEE-450 for Vented Lead Acid (VLA or flooded) batteries. However, best practices do not always equate to common practices. Governed by real-world factors, many facility managers are often forced to take into account the cost of performing the recommended IEEE schedule as it relates to the criticality of the application.
The following table represents a typical PM schedule for both VRLA and VLA (Flooded) batteries.
High ambient temperature and frequent discharge rates are most commonly responsible for reducing useful life across all types of batteries. (Dryout is the most common cause of VRLA battery failure.) Battery aging accelerates dramatically as ambient temperature increases. This is true of batteries in service and in storage. Even under specified temperatures, batteries are designed to provide a limited number of discharge cycles during their expected life. While that number may be adequate in some applications, there are instances where a battery can wear out prematurely.
Other factors that can cause premature battery failure include:
- High or low charge voltage
- Excessive charge current
- Strained battery terminals
- Manufacturing defects
- Improper room temperature
- Overcharging and over cycling
- Loose connections
- Strained battery terminals
- Manufacturing defects
- Poor and improper maintenance
While there are many battery services available, the best solution to maximizing battery performance is to utilize an integrated battery monitoring service that combines state-of-the-art battery monitoring technology with proactive maintenance and service response. This type of proactive solution integrates onsite and remote preventive maintenance activities with predictive analysis to identify problems before they occur.
If a power outage occurs, even a single bad cell in a string could compromise an entire backup system. In addition to implementing proper maintenance practices and monitoring batteries, safely replacing failing batteries will help keep IT systems running to specifications and minimize the risk of costly downtime to business operations.
IEEE standards recommend replacing a battery at the time its capacity reaches 80 percent.
Professional PMMost preventive maintenance measures should be left to qualified and trained personnel. UPS systems and batteries operate at high voltage among other things, and only qualified personnel should attempt preventive maintenance or repair. End users can provide preventive support such as replacing air filters when dirty, ensuring environmental specifications are met and maintained, and monitoring the UPS for alarms.
Qualified service providers offer a comprehensive portfolio of services. Service can be customized to satisfy customer requirements. In addition, preventive maintenance service should at least include the following to minimize time to recovery:
- 24 x 7 emergency services
- Parts replacement and available in the shortest possible time
- End-user training seminars detailing best practices and service tips.