At a recent 7x24 Exchange conference, MTechnology proposed a replacement/alternative or addition to the data center Tier levels. The Tier levels have been around since the ’90s and were developed to provide a common set of terms to help identify the availability of a data center. The certification classes were developed by the Uptime Institute (UI) and later provided as an informative annex to TIA for inclusion in TIA-942.
Several problems have surrounded the Tier levels in particular with non-accredited certification consultants, miscommunication of what the Tiers actually mean, and certification for design vs. construction and ongoing certification for operations. While the UI has a consulting practice to certify data centers, the truth is that most data centers and design engineering firms self-certify based on sometimes inaccurate criteria. But people generally get the concepts and self-certify as a terminology. The UI has an entire web page dedicated to dispelling myths about the classifications. Further, the Tier levels are only certifiable by the Uptime Institute, and not all organizations want to foot the bill for the plaque. While the Tier levels have been useful over the years, MTechnology believes it's time for a new standard or metric to address risk and consequence which are not addressed by the UI.
While the Tier levels discuss which systems are redundant, the new proposed Class system specifically addresses reliability as a risk metric. The proposed metric includes electrical power, cooling, communications, maintenance, and security. The Class system attempts to address risk that a site will fail which is not the same as overall availability. Risk adds the consequence quotient for the outage to the overall rating. In their scenario, a Class 10 data center would have a 10% chance of failure in the year. MTechnology stated that most failures occur during maintenance.
The work behind this is based on fault trees and other methods for determination of failure, some observed, some mathematical. The risk assessment takes into account any risk that could occur at the site and the likelihood of the occurrence. Further risk analysis is available from IEEE via standard 3006.7-2013 - IEEE Recommended Practice for Determining the Reliability of 7x24 Continuous Power Systems in Industrial and Commercial Facilities. This is the update to the old Gold Books that were used in the ’90s.
Is Facility Risk Enough?
All of these defined risks, reliability, availability levels try to address some guaranty that systems contained in the data center will be available for use. The hard part is determining which one works for your organization and whether putting a label on the uptime expectation provides a benefit, or if that expectation is only meaningful at design.
To complicate things just a bit more, BICSI also uses the Class nomenclature. At this point, MTechnology seems very interested in input into this proposed new metric. While I would say that I think the discussions at 7x24Exchange were fruitful, they were a start only. Where this and all predecessors fall short is not taking into account IT capabilities and a company’s overall ability to provide access to data.
Something else that is important is a company’s risk tolerance in relation to willingness to spend money.For instance, if a company has two Tier II facilities, there is a greater probability of access to data due to geographic diversity. There is also a significantly lower cost over a single Tier IV data center (about 1/3 of the cost to be exact). Virtualization and software failover is not included in these measures either.
Facilities Meet IT; IT Meet Facilities
At the end of the day, all a company needs from a data center is access to the information contained therein. In my opinion, it is about time for a joint set of metrics that looks at the entire ecosystem that houses company data. You can have all the power resiliency you want, but if a server goes down, it means nothing. Similarly if you have the best software failover on the planet and the power goes out, you still have nothing.
The separation of IT and Facilities is long overdue to end. One quick solution would be to combine the budgets, and let them fight it out. But the best solution is communication. Companies are wasting money and resources putting redundant systems on redundant systems at redundant sites. Risk to reward, now there is a metric that makes sense. I agree that risk must be part of the equation, but I don’t think it is for facilities to decide alone.
A new metric not tied to facilities but overall access to data seems to make more sense to me.