There is a disturbance in the force.
A long time ago in the mid-1990s, Ken Brill, Jedi Knight, brilliantly created the concept of a “tier system of availability” based primarily on the redundancy level of the facility power and cooling infrastructure, and subsequently founded the Uptime Institute (UI). While he has since passed on to a galaxy far, far away in 2013, many data center designers, builders, and operators have adopted his well respected concepts either directly or indirectly. Nonetheless, many debate if the UI “Tier” rating system is a “standard” or just a proprietary method for rating a facility. Over the past few years this has spawned other organizations to create their own version of a “tier” system, which has now turned into “Tier Wars.” To put this into perspective, here is a brief summary on the history and new players in the Tier Wars.
As noted above, the Uptime Institute (UI) formally originated and defined “Tier system of Availability” (Tier I, II, III, or IV, with the specific use of Roman numerals considered as their copyrighted and trademarked intellectual property). UI was acquired in 2009 by the 451 Group, which is a for-profit corporation. UI charges fees for their design review, as well as facility construction inspection services, which they require in order to bestow the coveted UI “Tier” rating.
The Telecommunications Industry Association (TIA) also has a similar framework, originally known as Tier 1-4, which was formalized and originally released in 2005 as ANSI/TIA-942 and updated 2010 as TIA-942-2. It was updated again in 2014 as TIA-942-A, with another revision expected to be released as ANSI/TIA-942-B in July of this year. The TIA documents define the requirements for each of the four levels in sync with UI precepts. It should be noted while not generally known, UI originally allowed the TIA to use the four tier concept. However, in 2014 the UI/451 Group went to federal court to force the TIA to stop using the term “Tier” and TIA now uses the term “Rated.”
In addition to the UI-based rating system, the TIA-942 series also added recommendations for copper and fiber network cabling. However, unlike the UI, the TIA does not do any review of designs or certifications of constructed data centers.
BISCI also got into the rating game with its BICSI-002-2010 based on Class 0-4 (updated as BICSI-002-2014). While technically this is a five level rating system, Class 1-4 is directly coupled to the TIA-942 standard (Class 0 essentially defines a wiring closet). Unlike UI, BICSI, like TIA, only sells the document, but does not offer or require a design review, construction inspection service, or issue certifications.
THE TIER 5 DATA CENTER
Four years ago I wrote an article, “The Advent of the Tier 5 Data Center.” At that time, I postulated that in the age of the visualization and software-based replication, basing the “availability rating” of a data center primarily on the redundancy level of the facility power and cooling infrastructure should not be the only rating criterion.
Just the mention of a “Tier 5” back in 2013 incited a lot of readers to express very strong opinions, both pro and con, about the term and concept that software-based failover and redundancy could reduce the reliance of facility level redundancy, while raising “availability.” Of late, there seems to be an increased flurry of activity regarding data center facility classes, levels, ratings, tiers, and any other categorization terms, which would portend to define (or imply) reliability, redundancy, resilience, and presumably availability.
BATTLE STATIONS - SET YOUR DISRUPTORS TO FUD!
Most recently, I sensed a growing disturbance in the “Force.” There are several other organizations trying to unseat UI and “protect” people, or at least introduce fear, uncertainty, and doubt (FUD).
This past June, Switch data centers declared that it created the new data center standard of excellence, the “Tier 5 Platinum” standard. A daringly defiant claim on their website states, “Switch views the UI rating system as insufficient.” A clear red line in the Nevada desert sand!
If this overt declaration was not clear enough, Switch goes even further stating “Moreover, in Switch’s opinion, UI standards have grown more focused on gathering clients rather than protecting data center consumers; more focused on income than the data center industry.”
That sure sounds like the ping that will be heard around the data center universe. It appears the folks at Switch are at DEFCON-3 and about to go up to “DEFCON-1.” So let’s get those trademark troopers and intellectual property attorneys suited-up and let the litigation begin!
PROTECTING THE WORLD AND THE USER!
It looks like Switch is not just trying to go “one up” on UI’s Tier standard, this is only the beginning — they are forming their own “foundation.” Their website states: “In addition to announcing the Tier 5 Platinum standard, Switch is partnering with some of the original authors of the UI to create a new, independent, non-profit standards body for the data center industry. Known as the Data Center Standards Foundation, or “DCSF,” the foundation will independently protect the industry, the world, and the users.” Wow, what a relief to know that Switch and DCFS will be protecting the world and users.
Moreover, even today, there seems to be a fundamental and popular belief that “the cloud” is inherently reliable and fault-tolerant to the general masses and even to some in the IT industry, despite having little or no information as to the redundancy level of physical facility infrastructure supporting the IT systems.
Toward that end, Underwriters Laboratories (UL) has decided that it should be the one protecting the world and users of the data center with its own standard “UL 3223” (http://bit.ly/2sTo5vt).
UL states, “With few existing options to protect the enduser, UL is developing a new certification program that provides enduser transparency, provider accountability, and proper data center documentation to further mitigate operational risk.”
WHAT ABOUT ENERGY EFFICIENCY?
With everyone’s finger in the proverbial pie, ASHRAE has had an interesting role in data center standards when it first included data centers in the 2010 version of the 90.1 standard for building energy efficiency and most recently with the introduction of ANSI/ASHRAE Standard 90.4-2016, Energy Standard for Data Centers, which establishes the minimum energy efficiency requirements of data centers for design and construction, for creation of a plan for operation and maintenance, and for utilization of on-site or off-site renewable energy resources. It provides multiple efficiency requirements based on level of power redundancy and cooling efficiency based on geographic locations. However, unlike other data center “standards” in the U.S., many states and federal agencies make ASHRAE standards part of building codes, making them a legal requirement, rather than a recommendation. See my article at http://bit.ly/20effl2.
The Green Grid - Redundancy and Efficiency
The Green Grid, which created the PUE metric for data center facility energy efficiency, is in the process of developing the Open Standard for Data Center Availability (OSDA). While not intended to be in direct contention with the Uptime 4 level tier system, the OSDA is a concept based on embracing data center innovation, driving towards sustainability and energy efficiency. The OSDA scale is 0-10 (0=low availability, low reliability and 10=high availability, high reliability). The model encompasses the fault tolerance of the facility and IT architecture. The goal is to design a site that meets business requirements, can use sustainable energy sources, and can be positioned (compared and between) for the existing four tier layers.
Data Center Automation – Machine Learning
However, virtually every industry survey shows that despite massive levels of redundancy, human error is still the most common cause of data center outages. This would lead many data center operators who want to eliminate human errors to consider a fully autonomous data center controlled by artificial intelligence (AI) and machine learning (ML). Google has shown that ML can improve energy efficiency. Google purchased DeepMind in 2014 and utilized its ML software to further optimize its already very good facility energy efficiency. According to the DeepMind website, using the ML controls “consistently achieved a 40% reduction in the amount of energy used for cooling, which equates to a 15% reduction in overall PUE overhead after accounting for electrical losses and other non-cooling inefficiencies.”
However, a fully autonomous data center devoid of humans is another matter. It is unclear if this will be an improvement over human operators or just another layer of things that could go very wrong, very quickly, if there are unanticipated events not foreseen by the AI/ML software developers.
THE BOTTOM LINE
In the age of the visualization and software-based replication and geographically diverse fail-over capabilities, as well as cloud based resources, how much longer will the redundancy of the physical facility remain as the defining yardstick?
As early as the tenth century, it is believed that the Saxon King Edgar kept a “yardstick” at Winchester as the official standard of measurement. Another tale tells the story of Henry I (1100-1135) who decreed that the yard should be “the distance from the tip of the King’s nose to the end of his outstretched thumb.” So with all due respect to Ken Brill, is his original “Tier” yardstick still the appropriate way to measure data center “availability”?
Regardless of the various internet and cloud services hyperscalers, which create and update their own open standard (consortium’s such as OCP and Open 19) at will, the traditional “enterprise” data center (and by implication the colocation data center facility) is still the critical foundation to support and secure the IT equipment, and stills needs to be well defined and judged by some form of redundancy based rating, at least for the foreseeable future.
However, this begs the question, how should we judge availability? Is it a projected probability for a design or actual operational history? So data center designers and operators, get your drawing and operating history logs ready and may the battle over predicted mean time before failure (MTBF) hours, marketing claims for the number of “9s,” 125% uptime (less than 0% downtime), as well as double, triple, or even quadruple (N+X) levels of redundant infrastructure. And if that is not enough, wherever possible, turrets, moats, and drawbridges keep the intruders away from the data center castle.
On a lighter vein, here at the Hot Aisle Institute, I have created the Cloud Data Center Rating Standard, which has a binary (0-1) scale: wherein all the IT systems, applications, data and networks, are either “UP” or “DOWN” (“UP”=1 being the highest rating).
On a more serious note, while we see the Tier Wars over the physical facility standard continue, today the “dark side” cyber threat levels continue to escalate, and they are far more likely to severely impact the availability of uninterrupted access to computing resources and secured un-corrupted data, in a traditional, hybrid, or cloud data center.
So my final piece of advice, there may be others in the data center galaxy joining the Tier Wars for control of the Empire, so use the “Force” (aka common sense) to choose wisely. Of course everyone has an opinion, but in this highly dynamic, rapidly evolving computing environment, what constitutes a data center “standard” vs. good design or simply best practice is a choice to be made based on its purpose. Ultimately, “fitness for purpose” should be the yardstick that we all use when deciding what we expect a data center to deliver.