As humans, we evaluate personal risk on a on a routine basis each and every day — when crossing the street, changing lanes on the highway, or picking out a weather-appropriate outfit. But despite our innate ability to identify and address risk factors in our everyday lives, the process of measuring risk within certain professional settings can be difficult. This is particularly true for the data center — but why?
For the past 20 years, the data center industry has looked to the resiliency and redundancy tier standard developed by the Telecommunications Industry Association (TIA) when evaluating a data center’s particular risk profile. Tier I is the simplest, with an expected availability of 99.671% and Tier IV is intended to be most reliable, with availability of 99.995% — equating to approximately 24 minutes or less of annual downtime. While this standard has helped to guide the design and build process, these tiers are still far too broad to provide operators with an accurate measure of their data center’s place within the risk index.
Take for instance two Tier III data centers, both with equivalent UPS support. Based on this, one could assume that both facilities have the same risk profile. Now consider that the first facility may be built in a commercial warehouse that is not hardened to local Miami Dade hurricane rating specs, while the other data center is in a hardened facility able to withstand a category EF3 tornado. Technically, both may be Tier III facilities, but which would be the safest facility for IT equipment during the hurricane season?
Oftentimes, the discrepancy between the risks that should be addressed and those that actually are comes down to costs and time. At the end of the day, all risks (within reason) can be reduced or managed with enough time or money. However, when both are constrained, data center operators cannot make the assumption that similarly tiered facilities are comparable, apples to apples.
Instead, I encourage data center operators to maximize their risk aversion during the data center planning phase by first identifying their IT needs, evaluating what risks should be mitigated, eliminated or accepted in order to protect the IT, and then designing a facility, or facilities, around these factors. This can be done by:
- Starting with the business application — larifying what the function of the business is and the requirements of the IT kit that must be met in order to maintain business continuity
- Working backwards — by going through operational failure modes (UPS off due to maintenance, output breaker tripped open, etc.) and seeing what impact each scenario has on power and cooling to the IT kit. From here, data center operators must choose to accept or eliminate the risks that were discovered. This experimentation helps reduce the cost and meet a data center operator’s ‘risk appetite.’ It may also uncover inconsistencies with design — accepting a Tier II design for domestic water and a Tier IV design for diesel fuel distribution, for instance.
For data center providers who support clients in numerous industries all with varying risk types, defining the type of business that is being run and who the services are being performed for is essential. For example, higher risk could be accepted for a provider supporting internet clients who also colocate their cloud infrastructure in a sister facility. Should one facility go down, the movement over to the other would be seamless, with no disruption to the client’s business. Because of this, the provider can take on certain operational risks (perhaps N+1 design) due to the built in redundancy of their IT kit. Other industries which don’t have this type of IT kit may likely require a more robust facility infrastructure (2N).
Beyond designing the data center physical infrastructure to support a specific business need against large scale risks such as natural disasters, operators should also ensure they have the right risk mitigation tools in their toolbox, including but not limited to:
- Software support in the form of data center infrastructure management solutions and building management systems, among others
- Perimeter security
- Disaster preparedness plans
- Appropriate personnel training and maintenance procedures
Lastly, data center operators should consider consulting third-parties that can provide additional expertise. While human inherently avoid identifying their weaknesses and seeking help, this process is necessary within the data center industry. Not all businesses or individuals can be the best at everything; by bringing in a third party, data center operators can ensure they approach their planning, building and maintenance with intelligence.
Threats that come to fruition can bring a business to its knees. To assess risk in the data center environment, operators should not look far from how they manage risk in their everyday lives. It needs to be done smart, with the right tools and the right support.