Seemingly overnight, COVID-19 elevated data centers from resources to utilities. While global financial markets have been severely and negatively impacted by the pandemic, that shock would have cut much deeper were it not for workers’ and students’ abilities to carry on from home. There is widespread agreement that remote work will continue for a long time and, in many cases, is here to stay.
Data centers today serve a utilitarian role. Just as assuredly as we expect the lights to come on when we flick the switch, we expect to be able to connect to online content, classrooms, and corporate networks on demand and without fail. Yet, according to the Uptime Institute’s (UI) 2020 Global Data Center Survey, outages continue to occur with disturbing frequency, and the bigger outages are becoming more damaging and expensive.
The No. 1 cause of unscheduled downtime is human error. The UI survey also showed that three-quarters of operators admit that, in hindsight, most outages were preventable. With more attention and some simple measures put in place to maintain operations, outage frequency would almost certainly fall significantly.
One basic and simple way to minimize human error is the good old checklist. While that may sound simplistic, think of how critical checklists are to the military, nuclear energy industry, surgeries, aviation, and other industries.
In our multitasking, distracted world, the simple task of checking a box creates consistency, eliminates mistakes, and ensures that steps are followed and processes are completed. This pays dividends in the data center environment. In our technology-based industry, checklists live on handheld digital devices, ideally with two-step authentication and validation for critical steps on a given checklist. This structure all but guarantees the efficacy and reliability of a host of data center operational processes.
Checklists in Action
As the need to deliver data centers faster than ever before escalates, many providers are adopting more routinized design and operational processes. This consistency, standardization, and uniformity can be an operational double-edged sword however. The uniformity of rows of racks in a vast data hall makes it easy to mistake one piece of equipment for another. Conducting maintenance on a piece of equipment that doesn’t need it is a problem. But it’s a problem that can be easily circumvented with a barcode /QR code and a two-step validation built into digital checklist to verify the right work is being done to the right equipment.
Standardized design and operation lend themselves to repeatable processes that can be built into a checklist to avoid downtime. Using an application derived from years of use in the aviation industry, checklists can be used for the following.
Maintenance — In addition to keeping tabs on all the maintenance jobs in a given workday and ensuring the right piece of equipment is serviced, the checklist helps ensure proper sequences are followed from powering down to restarting the equipment being worked on.
Physical Security — As data campuses grow increasingly large and subjected to external threats, applying checklists to security protocols give operators security validation, record of visits and tracks visitors’ coming and going. It’s also a good way to keep tabs of physical security: Are all cameras and door keypads operational? Is the exterior fencing secure? Are vehicle gates functioning properly? There is a host of security protocols that can be applied to a handheld, checklist feature.
Crisis Management — Data centers, depending on location, can be vulnerable to threats, such as earthquakes, flooding, tsunamis, volcanoes, and hurricanes. At the start of the year, many tech platforms experienced domestic terror threats after they took steps to eliminate certain content, and hosting data centers feared compromised data center stability. Crisis preparedness protocols, contacts, information gathering needs, and other resources are important and systematic elements of secure operations. Checklists are and will continue to be large part of preparedness as I am reminded of a hero pilot who landed an Airbus A320 on the Hudson River in New York City 12 years ago and saved 155 lives. He used a checklist to prepare for this landing!
Other technologies like machine learning and AI will continue to gain prominence in data center operations and operations staffs will play a large part in effectively operating data centers, but the humble checklist has prevented many a disaster in high-risk industries and should be exploited to achieve maximum benefit.
We in the data center business can borrow lessons from the military, medical, and aviation industries. By applying checklist rigor to the way we operate data centers, we can improve uptime for these increasingly integral assets.