In today’s complex data center environment, operators must continuously juggle multiple integrated mechanical, electrical, and data systems to minimize the risk of critical failure. Data center managers traditionally have relied on preventive maintenance (PM) programs to manage risk and keep facilities and systems running. Reliability and performance are the goals, and detailed maintenance routines include tasks such as overhauling a rotary uninterruptible power supply (UPS) or generator and replacing system components on a schedule to prevent failure during live operation.
However, some of these standard operating procedures are not optimal and can even introduce more risk into the system. For example, replacing certain non-critical components as part of routine maintenance can actually increase the risk of a critical failure, rather than reduce it. While it may be counterintuitive, data center managers would actually be better served by letting certain components “run-to-fail” (continue operating in place until the component wears out or fails naturally.)