Efficiency and reliability are both critical to successfully operating a data center; however, only one of the two is currently used as a reportable metric. Most operators are familiar with power usage effectiveness (PUE), which was introduced in 2006 to measure energy efficiency in the data center.
But what about reliability? Are we missing a meaningful performance metric to assess the risk of a data center power outage?
Steve Fairfax at MTechnology believes we are, according to his presentation and subsequent panel discussion at 7x24 Exchange Spring Conference in early June. Uptime Institute’s Tier Classification System provides us a common language to talk system redundancy and availability. However, availability cannot measure risk, suggests Fairfax, and doesn’t take into account the number of outages and mean time to recovery (reboot, recover and repair corrupted data, etc.). As he explains, two data centers may claim 99.999% availability, but one loses power 10 times per year for 30 seconds each time and the second loses power once a year for five minutes. The former has a far different availability and uptime profile than the latter.
The new “Class” metric proposed by Fairfax measures the probability of future system failure and is expressed as a percentage chance of failure over one year of operation. For example, a facility designated Class 1 has a 1% chance of failure per year while a Class 5 facility has a 5% chance of failure per year.
I like the fact “Class” takes a holistic approach in measuring the probability of risk, taking into account design, operations and communications. The metric could potentially enable organizations to make better design decisions knowing the consequences of certain approaches and provide a more transparent metric for operators to compare vendors and designs.