It’s a simple question — how do you define mission critical? But the answer can be a bit complicated. As facility owners, operators, designers, and engineers, we need to consider what aspects of our businesses, enterprises, and facilities are truly critical to the missions we are tasked to perform.
Most data centers are considered “mission critical.” But some enterprises now have such inherent reliability in their enterprise software and site failover capability, that an entire data center can fail and the “mission” is essentially unaffected. So are their data centers still truly mission critical?
On the other hand, typical office buildings are generally considered non-critical. But what if the people working in a particular office environment are performing business functions that when interrupted have an impact on the corporate mission? Is this building still non-critical?
Let’s complicate the discussion further. Many work functions associated with typical office environments can be interrupted for a brief time without a noticeable impact to a corporation’s bottom-line. The same holds true for facility support systems. Now consider an extended outage such as for hours or days? Can the business continue to support its mission when these ancillary “office” functions are halted? It is probable that eventually the loss of these functions will result in a significant business impact (or why else would they even exist?).
The answer requires a thorough understanding of not only what the mission is, but also of what are the essential elements required to support the mission, how their loss impacts the mission, and how long it takes for the impact to occur.
This concept applies at the highest level, such as how the loss of a business unit impacts a corporate mission, down the lowest level such as how the loss of a fuel oil pump impacts an emergency power plant. If you have one pump and no day-tanks, it could be an almost immediate impact. If you have redundant pumps and large day-tanks, it could be no impact at all.
If you lose the corporate administrative office and associated human resources, accounting, payroll, legal counsel, etc., for an hour or maybe a day, there might not be a noticeable effect on the overall corporate mission, especially for the company’s external clients. But if it happens to impact pay-day for all the employees, and timesheets can’t be completed, or contracts can’t be executed, work tickets and service requests can’t be processed, etc., the impact could be substantial.
This becomes the realm of disaster recovery and business continuity. Consider the domestic city water system that pretty much all facilities have. For data centers that rely on water-cooled chiller plants or evaporative coolers, city water typically supplies water to the “critical” make-up water systems. Most of these sites have provisions for the potential loss of city water by installing make-up water storage tanks, tanker truck connections, or wells (or a combination of all of the above). These provisions can facilitate long-term continuous operations of the chiller plant without reliance on city water.
The short coming is that the facility cannot remain occupied without bathrooms and drinking water for an extended period. Eventually bathrooms become mission critical unless the site is truly autonomous (no need for an on-site staff) or there aren’t provisions for alternate bathroom facilities and drinking water. Otherwise the health inspector will close the facility and send everyone home. So the question becomes not only how critical are the on-site staff, but how long can you sustain the mission without on-site staff?
In the case of a long-term crisis event such as a hurricane, regional flood, or blizzard, it is not sufficient to consider how much fuel oil and drinking water you need. The site staff has homes and families to take care of, need hotel rooms and food, etc. Just arranging for transportation to and from the site becomes critical. These needs may be urgent for a typical data center site, but eventually it could become necessary for all those office staff as well.
In some ways, as data centers, IT, and networks become more robust, distributed, virtual, and move to the cloud, the question of what remains critical with regard to staff, equipment, systems, and even facilities has become harder to identify. When a critical application moves to the cloud or gets virtualized, just exactly where is it physically? Are you sure the server it resides on has both cords plugged into diverse power sources when you aren’t sure which server it resides in, or that the redundant servers are on redundant networks, etc.? And what about the site staff that manages the cloud? A company may be able to allow its critical employees to work from home, but what about the staff managing your outsourced cloud that you may have little or no control over?
Let’s get back to facilities. Consider the lightning protection system that most facilities have on their roofs. Is this a critical system? Not really until of course the building takes a lightning strike. I know of extreme cases where a single lightning strike not only instantaneously impacted a site’s mission, but caused such extensive damage to facilities critical support systems and associated monitoring and controls, that the site was down for an extended period and repairs were exorbitant. The site maintained years of continuous operations with the flawed lightning protection system and so it wasn’t properly inspected, tested, or maintained as more critical systems were, such as chillers, UPS systems, and IT networks. But when lightning struck, all those systems were useless in protecting the mission.
Consider plumbing and drain systems. Are they critical? Their specific function may not be, but many outages were caused by leaks that impacted critical electrical equipment or IT equipment, and in worst case scenarios preceded electrical faults and even fires.
So as usual, the devil is in the details. You have to not only look at the big picture, but you have to dissect it and reduce each aspect to the fundamentals and consider what the interdependencies are and what contingencies are appropriate to prepare for potential outages and failures. This applies to all aspects of a mission critical operation. It applies to the business units, to IT and data centers, and to the staff that manages and performs the mission. The military understands this as well as any organization. You can’t win a war if you can’t feed your troops.