Untapped Potential: Big Data in the Data Center
We built the ship, shouldn’t we get on board?
Wikipedia defines big data as the term applied to data sets too large for commonly used software tools to capture, manage, and process. Big data sizes are constantly growing. In a 2001 research report and related conference presentations, analyst Doug Laney of the META Group (and now Gartner) defined data growth challenges (and opportunities) as being three-dimensional, including increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources). Data sets grow in size, in part, because they are increasingly being gathered by ever-present information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, wireless sensor networks, and so on. Every day, 2.5 quintillion bytes (that’s 18 zeros) of data are created and 90 percent of the data in the world today was created within the past two years.
This sounds a lot like all the data that’s available but seldom used in our data centers. For the last 40 years, we have talked about metrics, energy efficiency, etc., and from time to time a few have collected some relevant but statistically insignificant data. Today, we still lack the organizational means to collect, analyze, and report on this enormous BIG collection of data to improve how we operate, even as the increasing numbers of sensors and meters dramatically increase the amount of data we collect.
Here are just a few examples:
• Utility planning and guidance for user requests. I have been dealing with power utilities since the mid 1970s, and it seems that no matter what number they are given regarding power needs for a new facility, the utility comes up with a much lower number. Then the battle for the right-sized service to support user growth begins. It shouldn’t have to be this way.
All the data centers that have been built in the last five, 10, and even 15 years means that there is a significant amount of data sitting in utility company databases that, if analyzed, would tell us the typical load growth rates and the spread on these growth rates (e.g., where do 80 percent of them fall). These data could be analyzed by categories such as size, Tier, and type/use. Just think what this type of information could provide in the way of utility planning.
• PUE. A few years ago I sat in on conference calls the U.S. Department of Energy was holding for the development of the PUE standards. There was tremendous interest in the process, yet, if I recall correctly, only 100 companies contributed valid data. Since then thousands of data centers have started tracking their PUEs and tens of thousands still have yet to start. The problem is that no one knows the PUE trend because the data reside with individual companies and not in a collective database.
Last evening, I came across a web story about how data center efficiency actually drops 20 to 30 percent after five years. The problem I had as a reader was that there wasn’t any data to support the expert opinion of the writer who stated this as a fact.
So, because there is no central recording database, as an industry we cannot even track how we are doing with respect to improving PUE. Further, does PUE vary much by size, Tier, industry, age, etc.? Further, what is the best PUE in each category and each geographic region? This is valuable knowledge with respect to planning, designing, and operating data centers.
• Outages. Outage recording is another interesting area in which mega amounts of data are collected yet access to the information in usable form is non-existent. When I place a large UPS, generator, or chiller order, I can usually extract some printed statistics from the vendor, but that is normally too late to tell me if I am buying the most reliable product. If we were buying jumbo jets, we would have stats on virtually every component. We could look at a GE or a Rolls Royce engine and have years of good data to help us make a selection. Outside of the aviation industry, the ability to produce this type of data seldom exists.
As I go through our data center designs and operations, I find there are hundreds of data reference points I would like to know:
• What are people really averaging with respect to kW/cabinet?
• Is my power draw becoming more dynamic with new technology?
• How many gallons/day are the cooling towers consuming?
• How does this volume vary by month?
• How quickly does the average data center get populated after construction?
The records are out there to answer these and many other questions, but they sit in facility servers, manufacturer mainframes, and/or on the PCs, iPads, and individuals’ smart phones.
As an industry we need to come together and form an independent non-profit organization for the collection and reporting of this data so we can improve and do it faster.
In the long run, if we do not do create the necessary clearinghouse and support such an organization, the government will step in and start to regulate the reporting. Do we really need more regulatory burden?
I would like to hear your opinions.
Is there enough interest to start an organization dedicated to this collection and reporting?
Can we gain a 50 percent participation rate or better?
Reprints of this articleare available by contacting Jill DeVries at firstname.lastname@example.org or at 248-244-1726.
In my last column, I referred to 380 Vdc and HVDC (high-voltage direct current) interchangeably; however, 380 Vdc is actually classified as low voltage by IEEE”.
The context in which the “HVDC” was used was one of common trade language rather than the more accurate reference to the standards, just many (non-engineers) refer to 277/480-Vac as high voltage when in fact it is not.
This language was intended to differentiate the discussion fram applications using 12 or 48 Vdc which are the widely recognized LVDC voltages established in the industry today.
So let’s be clear, 380 Vdc should be correctly referred to as LVDC.