The repository of data collected by the critical facility industry contains zetabytes. This is an inordinate amount of information and, certainly, difficult to wrap your mind around. Plus, the pot keeps growing and growing. The emergence of advanced, automated data center infrastructure management (DCIM) tools has allowed data center operators to aggregate, analyze, and integrate the massive amounts of data collected from the multiple, disparate platforms which monitor servers, cooling, power, and other ancillary systems. The process of identifying the specific factors most likely to influence the value of a discrete body of data, especially one that you actually care about, can feel unworkable. Most data center facility professionals focus on the design, implementation, system control, and monitoring of infrastructure causing data overload and a dearth of actionable ideas. Our industry is poorly served by overemphasizing the “I” in DCIM. This terminology fosters a “break-fix” mindset; in reality, infrastructure is only as good as the people managing it. Consider that on average, 65% of downtime in critical facilities is the result of human error. Is this really the only reliability strategy for your critical facility considering all the advanced technologies that are available today for reducing human error?
As professionals, we need a renewed focus on data center operations management, or DCOM. DCOM involves strategies related to the people, processes, documentation, maintenance, testing, training, lifecycle, change management, and risk mitigation measures employed to ensure long-term reliability. Marrying DCOM with more traditional DCIM tactics, in conjunction with predictive and prescriptive analytics and algorithms, as well as industry intelligence and external data sources, or “influence information,” is the more strategic path toward the Holy Grail of the critical facility industry, the perfect standard of 100% uptime or unity.