Darwin In The Data Center
Evolution or de-evolution of the data center?
Just before writing this article, I finished teaching a series of data center classes over a two week period. It covered the basic elements of data center design and examining more recent advancements driven by energy efficiency, as well as delving into cooling and power systems. The course begins with a summary of the data center in the age of the mainframe and punch cards, and why they were the original drivers of very tight environmental requirements, and then examines the developments that led up to ASHRAE’s X-Factor. The X-factor, released in 2011, projects relative IT failure rates vs operating temperature. It postulates that modern equipment has become much more rugged and can endure free cooling over a wide temperature range from direct outside air with minimum impact on overall relative annualized reliability.
While I have taught these classes many times over the past several years, by the end of the class this time it suddenly struck me that stepping back from the technical aspects of the material, there appears to be a potential dichotomy developing. As IT hardware and software continues to evolve (presumably forward) is the data center facility moving forward with it or perhaps backward?
So what about Darwinism and de-evolution in regard to the data center? Darwin’s theories cover a wide swath and are normally referred to in relation to evolutionary human development, as well as other living organisms. However, one aspect in particular; natural selection and the survival of the fittest — would seem to apply to the sometimes perverse relationship of development IT and the data center facility (please note this is not meant as any type of political or religious statement).
I think virtually everyone would generally agree that IT equipment has evolved greatly since the mainframe days on many levels; speed, energy efficiency, size, form factor, as well as greater reliability over wider environmental operating ranges. If IT equipment were to be viewed as a living organism, it has adapted and thrived to the point where it is ubiquitous and embedded into daily life and for some people has become almost as essential as food and shelter.
Therefore, if IT hardware is viewed as a living organism, the data center itself is the protective shell to ensure a safe environment and provide proper “nourishment” (power). In the beginning, the IT equipment was very fragile and relied on the facility to provide a very constant temperature and humidity, as well as very highly condition power (i.e., double conversion UPS) or it would fail. While that was the case for the first 50 years or so, it was epitomized by the original Uptime Institute’s “Tier IV” 2(N+1) infrastructure redundancy model, however, energy efficiency was never a consideration.
More recently, the need for the data center to provide these cocoon-like surroundings has diminished greatly for some applications. Over the last 10 years, internet-centric organizations, who rely on software and geographic diversity for redundancy, rather than facility infrastructure, have forgone this “belt and suspenders” model and have minimized or eliminated many or almost all of these “protective” elements for greater energy efficiency. For example, Facebook-Open Compute, Google, Yahoo’s Chicken Coop, and others, which minimizes or totally negates the need for mechanical cooling.
Besides cooling, some internet focused data centers are even forgoing centralized UPS conditioned power instead of using localized or onboard battery back-up. This can be seen in Google’s early custom servers using an onboard VRLA battery, Facebook’s/Open Compute using rack-based power, or most recently Microsoft’s Open Compute server power supply with an internal lithium-ion battery. We have even seen this drive for energy efficiency infiltrate the design of the traditional double conversion UPS with the introduction of the so-called “eco-mode” option, which puts the UPS into static bypass and essentially operates the critical load on utility power, but will revert to double-conversion (typically within 4 ms), should incoming power become unstable or fail.
Further undermining the dependence on the facility infrastructure, there is a somewhat similar change in the direct dependences between the software application and the IT hardware, due to virtualization (and other failover and load sharing technologies), so that a hardware failure no longer will cause the software application to “die.” This also reduces the necessity to have high levels of infrastructure redundancies in the individual data center facility if we can have real-time or data replication between geographically diverse sites. This has increased the presumably symbiotic, but somewhat contentious, relationship between IT and facilities to diverge as well.
Living organisms ranging from microbes to mankind depend on some form of food source to exist. They also adapt to the type and amount of food available to survive. In continuing along that analogy in the case of IT hardware, energy is the food source and “they” have adapted to a lower energy diet in order to increase “their” numbers using less energy. This applies to the standard data center IT servers and also extends to the use of low-power CPU servers based on Atom and ARM processors (originally designed as mobile chipsets in order to maximize battery life). Organisms also change their form to adapt to conditions and certainly IT hardware has done that almost continuously.
DARWIN’S THEORIES MEETS MOORE’S LAW
At the micro level, IT equipment performance has generally continued to follow Moore’s Law over the past 50 years since Gordon Moore predicted that every two years would bring a doubling of transistor density on a chip (a few years later it was amended to reflect that overall chip performance would double every 18 months, when factoring speed increases). Fortunately, this projection has been generally matched by memory and storage capacity and to a varying degree, network bandwidth. IT hardware also has become much denser and while it has also become far more energy efficient overall, rising power densities have challenged many traditional data center cooling systems.
SURVIVAL OF THE FITTEST: THE ECONOMICS OF IT SYSTEMS
Natural selection and survival of the fittest are generally used interchangeably today and attributed to Darwin by most people. However, survival of the fittest was coined by Herbert Spencer in the late 1800’s in which he drew parallels between his own economic theories and Darwin’s biological ones (at least according to a Google search pointing to Wikipedia).
Economics is one of the major reasons that the IT equipment vendors changed their positions regarding IT equipment operating conditions, accepting a much broader operation environmental envelope. This can be seen in the most recent ASHRAE specifications, which originally prescribed very tight temperature and humidity envelopes, which was mandated by the earlier IT equipment’s inherent limitations (and reinforced by cooling equipment vendors’ economic incentive to sell more cooling systems).
In a rather startling 2011 announcement and almost total reversal of direction (driven by IT vendors), ASHRAE declared the long term goal of “eliminating mechanically based cooling” in the data center. The drive for reduction of facility cooling energy is also reflected for mainstream OEM IT hardware — as noted in the current edition of the ASHRAE Thermal Guidelines, which includes class A4 IT hardware, which can operate up to 113°F (45°C) air intake temperatures.
At the macro level, data center cooling limitations have directly or indirectly impacted some IT equipment deployments and to a certain degree, sales. Moreover, traditional mechanical cooling systems, coupled with high levels of redundancy means that for every kilowatt or megawatt of utility feeds and generator ratings, a substantial portion of those system’s capacities goes toward supporting cooling systems, rather than the IT equipment, further potentially influencing sales of IT equipment. For example, an older site with a high level of redundancy, operating at a PUE of 2.1, would only be able to deliver less than half of the power (kW) to the IT equipment. While a site with minimal infrastructure delivering a PUE of 1.1 would be able to power approximately 90% more IT load for the same capital investment and therefore benefit from similar net recurring energy cost per unit of computing capacity.
By vastly improving the IT equipment’s environmental tolerance, more CAPEX and OPEX could be used to purchase and operate IT hardware and far less would be allocated to the facility — a very simple economic case for survival of the fittest – improving IT sales and numbers, while reducing revenue for cooling equipment
Furthermore, ASHRAE also covers liquid cooled IT equipment classes W1-W5 of which W5 (which encompasses “hot water” cooling) uses water or other fluids above 113°F (45°C), not just to free cool the equipment, but to also allow for effective energy recovery and reuse. Although not yet mainstream, this could be the holy grail to harness the massive waste of heat energy generated by the IT equipment, providing even greater economic justification for liquid cooled IT hardware.
THE BOTTOM LINE
In the 1968 movie 2001: A Space Odyssey based on a story by Arthur C. Clarke, the computer that operates the space ship (ironically called HAL) can converse with the crew using speech. The plot revolves around the fact that HAL seems to have become sentient. What is technically interesting is the fact that the main character Dave needs to disable HAL and starts to remove computing modules, which look like clear plastic slabs, each about the size of a deck of cards. While operating the chassis, they are brightly illuminated (there are no power or network connectors or cooling fans, just bright light). Once removed, the modules go dark, and as Dave continues to remove more and more modules, HAL’s speech slows down and eventually “dies.” Nearly 50 years since the movie, it would seem that we may be finally getting closer to this fault-tolerant low power, optical supercomputing scenario. In May, IBM announced a silicon photonics-based chip that they claim will transform data centers as we know them. I wonder what the data center will look like if this and other technologies such as graphene processors and memristors become the basis of mainstream computing and networking.
While computers and other digital devices (in its many forms), may not be considered a sentient life form yet, they have clearly evolved. However, so far they have not yet become self-sustaining, nor able to reproduce themselves, perhaps only because humans fear AI and are not ready to allow it. Although I am sure my column and this reference to Darwinism would not pass muster for a doctoral thesis for biology, sociology, or economics, I do see a parallel, perhaps using “fuzzy logic” or quantum computing.
The increasing demands and challenges posed by the expected onslaught of the coming tsunami of “IoT” and “wearables” will further highlight the need for cost effective computing. Nonetheless, “computing” and computing equipment in the data center has now become an industrial process and as such, the rule of “form follows function” and economic efficiency, will drive business decisions —which will ultimately be reflected as design decisions.
We are just now beginning to consider the morphing of traditional OEM branded IT equipment into “bare metal” hardware or open source hardware operating as software defined everything, which will presumably define the software defined data center. However, in reality, constant and rapid changes in IT equipment requirements are limited by physical realties of the facility, simply because unlike IT equipment, data center facility systems cannot be easily upgraded via a software upgrade — and a facilities “firmware” upgrade involves a forklift or perhaps a crane.
Therefore, if computing hardware can operate sans mechanical cooling and conditioned power and the software provides redundancy, the data center cocoon as we know it today will continue going through a metamorphosis. Over the long term, it may eventually de-evolve to a simple warehouse type shell, designed primarily to keep out rain, snow, and intruders. Or perhaps just a series of geographically dispersed globally linked containers (full of silicon photonic processors), situated in parking lots, powered by solar panels, and only operating when the sun shines.
Until then, I wonder what Darwin would think of the Apple Watch and why it needs to advise the wearer when to stand up.