Industry professionals who attend more than one discussion on data centers can be forgiven if they sometimes complain about the overwhelming feeling of déjà vu. Microsoft’s Christian Belady recently challenged once such gathering over the lack of progress on greening data centers, despite all the presentations on the topic over the past year.
The distinguishing characteristic of the Uptime Institute’s symposia and other events has been its obsession with gathering attendee response and measuring activity in the field. Everyone seems to agree that action is needed to improve the overall energy performance and reliability of data centers, hence the proliferation of public and private initiatives. Yet no group or initiative has broken the inertia surrounding data center energy practices.
Meet the PanelistsEditor’s note: Mission Critical and ASCO Power Technologies collaborated to hold a Technology Roundtable on Power Reliability in the Data Center. The event was held April 2nd at the MGM Grand in Las Vegas, NV. Mission Critical would like to thank Bhavesh Patel of director of Marketing for ASCO for his help organizing this event. Armand Visioli, president of ASCO Power Technologies, introduced the group. Panelists were:Editor’s note: Mission Critical and ASCO Power Technologies collaborated to hold a Technology Roundtable on Power Reliability in the Data Center. The event was held April 2nd at the MGM Grand in Las Vegas, NV. Mission Critical would like to thank Bhavesh Patel of director of Marketing for ASCO for his help organizing this event. Armand Visioli, president of ASCO Power Technologies, introduced the group. Panelists were:
“I lead the design team within Rosendin, which focuses mainly on data center construction. We do a considerable amount of new data center construction as well as renovation of existing facilities. It’s a rapidly evolving business, where yesterday’s data center can quickly become non-competitive.”
Sudhir Kalra, Morgan Stanley, Global Head, Enterprise Data Centers Engineering and Operations
“I’ve been involved with data center engineering management and operations work for the last 20 years or so. We’re very focused on energy efficiency, reliability, and new and exciting technologies as they relate to data center construction and/or efficiency and reliability.”
Michael Manos, Microsoft Corporation, General Manager, Data Center Services
“I have responsibilities for the design, construction, and long-term operation of all data centers across Microsoft. We began a fairly significant construction program and design program around data centers about two years ago, really driving towards efficiency and overall utilization of our facilities as we build them.”
Ted Martin, Digital Realty Trust, Vice President, Technical Operations
“Digital Realty is one of the largest REITs [Real Estate Investment Trusts] on the New York Stock Exchange. We specialize strictly in data centers. We acquire, build, design and manage them. One of the biggest challenges we face as we acquire legacy buildings, or existing buildings, is trying to measure and manage their energy consumption.”
Glen Neville, Deutsche Bank, Director, Engineering
“My responsibilities include directing all engineering works for DB [Deutsche Bank] in the Americas Region. I have spent 20 years on the consulting side of this industry performing commissioning, design, and testing of data centers. At DB, we’re constantly reviewing our data center strategy, trying to refine what our strategy will be for the next 5,10, 12 years. We want to make sure that our data center can develop and evolve into a data center we can still utilize years from now.”
Greg Sawyer, P.E., Burr Computer Environments, Electrical Engineer
“Burr Computer Environments is a design-build consulting engineering firm, and we offer full-time on-site construction management services. I’m a project engineer who focuses on electrical design. I manage equipment procurement and commissioning for all types of data center projects.”
Brian Schafer, Highland Associates, Director of Business Development and 7x24 Exchange Metro New York Chapter, President
“Highland Associates is an architectural engineering firm, with a focus on mission critical facilities. Most of our clients are in the financial world, Goldman Sachs, JP Morgan, Citigroup, Deutsche, Morgan Stanley. We design data centers as well as infrastructure upgrades. We’re also working with Digital Realty Trust on several projects.”
Bob Schuerger, PE, EYP Mission Critical Facilities, Principal
“My particular role has been doing reliability modeling for EYP. A number of years ago, we decided that we needed to get more advanced in what we were doing, and I became involved with the IEEE Gold Book.”
Joseph Soroka, Total Site Solutions, Senior Vice President, Facilities Management
“I manage our Commissioning and Maintenance Division, so we’re involved in all aspects of staffing facilities and commissioning. Some of the things we’re trying to do is tie in energy efficiency and information from Green Grid, capture that data during commissioning, and make it automated, so it can be turned over to the maintenance side.”
Evangelos Stoyas, P.E. Power Systems Consulting Consultant, U.S. Army Corps of Engineers, retired.
“I was very active in the new section NFPA 708 of the National Electric Code. Two major projects that I worked on before retiring were reliability analyses for two Department of Defense sites in Colorado Springs.”
Kevin Heslin, Editor, Mission Critical, Roundtable Moderator
Key DriversThe panel discussed a number of other topics, including containerized solutions, application-based reliability, NECA article 708, and how legacy equipment best fits into increased power density environments. Michael MANOS, who had delivered a keynote address at AFCOM’s Data Center World, began the proceedings with remarks on power reliability, and soon everyone had contributed a thought.
Manos: There’s an evolution happening and a ‘tiering out.’ There are going to be customers who will continue to drive towards heavy reliability, heavy redundancy requirements. In our world, we’re starting to see a split where we’re driving a lot of redundancy into the applications themselves. And that actually lowers the redundancy requirements of the buildings. Because once you’ve figured out how to drive reliability and redundancy into the applications, you only have to do that once, and then you don’t have to do that anywhere else. It’s there forever.
We’ve recently announced our first facility in Chicago, which is going to be predominantly container based. We’re going to have 220 containers, or up to 220 containers, on the first floor of that facility. They will be drawing pretty significant amounts of power, but redundancy is built into the applications.
Most applications aren’t ready for that geo-diversity. Most applications are building into that. So I look at it in three tiers. There’s a cloud services data center, which has a different set of requirements. Then there’s a large user, people who are looking for more redundancy, 2N plus 1 or however you want to qualify that. Then there are lower-tier folks, who will predominantly go into leased facilities and inherit whatever infrastructure those facilities have.
The data centers have become so convoluted that having a problem with a server or with a bunch of servers ends up impacting the majority of your data center and that means the majority of your business.
The grid computing that everybody talks about these days is supposed to be based on that model. The idea is that you have a grid farm you don’t necessarily need to be Tier IV. It can be Tier II because if the grid farm goes down, that workload is picked up by another grid farm somewhere else. I just haven’t seen that work very well.
Neville:We are also focused on sustainability. One of the concepts we’re reviewing is having a single primary and single backup data center strategy. We’re considering moving away from that a little bit, pulling out our blade servers and going into perhaps a three data center arrangement, with a primary and secondary site and with blade server processing in a separate location. The modular approach that Mike talks about is certainly something everybody’s looking at, because it just makes sense from an engineering standpoint.
Martin: Legacy sites were never designed for the demands that we place on them today. It’s a huge investment from our company’s standpoint to find out how we’re going to make a building work. We have multi-tenant buildings in a lot of locations, and our customer base hears all these key words-such as so many “nines” or that a facility is Tier IV-and they can start out with some faulty assumptions about what they need. When you start to dig down, it’s important to ask questions, such as: What is your business requirement? What do you want from this site? Some customers just absolutely demand it (high reliability or redundancy levels). We don’t always think it’s their best investment.
Emert: It wasn’t all that long ago that IT or computer departments shut the computers off at the end of the day and went home. No one had the 7x24 sense of operation businesses now have. It’s been a paradigm shift as to how important data centers are to the success of a business. Everybody’s asking for Tier IV data centers. They never want to go down, at least until they start appreciating the cost implied in building a true Tier IV center.
Soroka: Densities continue to grow and grow, as does computing power. We’re running into issues--not only the cost of operating a facility, which is very large, but also having power availability through the utility company. In the past, there was a large disconnect between operations…the facility side and the IT side. Some of the facilities weren’t tightly controlled. We can save a lot of money just by improving the performance of facilities operation and measuring it
During the dot-com era, I commissioned a couple of large facilities, and I went back after they were resold and found the commissioning documents in the same closet I had left them five years ago, with two inches of dust on them. If the operator never took that documentation and did anything with it, it wasn’t valuable information. It was worthless.
Sawyer: Every company pretty much needs a data center. Some of the smaller companies have a hard time justifying the financial aspect of much of the radical type of energy approaches--like going to dc power. I believe it will take legislation to make all data centers as efficient as everybody wants them to be. It’s also going to take server and infrastructure equipment manufacturers to make equipment more energy efficient.
Schafer:We’ve got to look at the different businesses. Hospitals and retail stores now all have data centers. The Microsofts, and the investment banks, too. The challenge for the big guys who have the 150,000 to 200,000 square feet, the $300 million to $500 million projects, is planning. We’ve recently seen in the financial world that things can change over a weekend. So you may be looking at sites, looking at buildings, identifying pre-purchase equipment, with 52 weeks of lead time for buying generators and switch gear. Planning requires long-term thinking, but the plan could change on a dime because of market dynamics or customer reactions.
Kalra:Years ago, I wouldn’t have thought I’d be sitting at the same table with somebody from Microsoft and discussing data center challenges. So these challenges have become much more widely discussed.
Twenty years ago, it used to be about data center reliability. But there now seems to be a shift from reliability to energy efficiency, or being green, or being sustainable. It almost seems like we’re going from one extreme to the other, because I hear some people talking about having a PUE of 1.2 in their data centers.
If you want to drive down your PUE, start installing steam-driven chillers or natural gas-driven chillers. That drives down PUE. It doesn’t necessarily drive down your energy costs or energy efficiency. In the next few years, the data center industry will probably end up finding a middle ground somewhere. Six nines of reliability at whatever dollars per square foot, if I bring that down a bit and sacrifice my reliability a bit, I can save significant dollars in first costs. On the other hand, I can drive my energy efficiency up by taking certain factors into account when designing and planning for a data center.
Schuerger:Much of what we’ve been talking about this morning is kind of an evolutionary process. As Microsoft builds a new application, they need hardware that can drive this application, which means somebody’s got to get the environment that the application hardware needs.
If you look at the utility world, it’s gone through this exact same process. They have developed a methodology that solves a problem. Then it evolves, and it evolves, and it evolves. At the end of the day, they have an even better one that does what they started doing three times as well for half the price.
But the piece of the world that they never have applied is reliability-centered maintenance [RCM]. One of these days, we’re going to start engineering our maintenance like we engineer our infrastructure and optimize everything. But in the paradigm of the past, it was always the guy doing maintenance who said, “Let’s see. If something goes down, I can get fired. If I over-maintain it, nobody says anything. Guess what, forget about reliability-centered maintenance,”
Schuerger: Providing the data center is not sitting empty, is it really overbuilt? Maybe you have more redundancy than you absolutely need. The data centers that I see as a waste are the ones sitting empty.
Neville:A common problem with the data center environment is that the designs become so complex that the operations team has trouble operating it. It goes right to what Joe (Soroka) was talking about. Integrating the commissioning with the training of the operations team is so important. Training the operations team on how the site runs during commissioning leads to fewer issues down the road. However, design engineers, in an effort to try and get the most reliable system possible, tend to over-design controls on UPS and generator systems, to the point that when there is a failure, the system is so complicated to troubleshoot that restoration times are lengthy.
You have to have metrics, whether it’s to help determine cost or to determine a degree of system availability. The government cares about metrics. We care about cost, and we care about RCM.
So, in the 1990s, we embarked on an effort to get statistical data that were accurate and consistent. We launched a newer initiative to get additional data in 2000. Once we had these reliability data, it benefited the Department of Defense (DoD) in the utility system analysis of these critical sites. However, we had to translate the data into a useful tool. We applied the data analysis to mundane DoD utility systems, because that’s what we found at our defense sites. We had these multimillion-dollar computer systems, operated by a “five cent rubber band.”
My other responsibility was to transfer a tool that we knew would work to private industry in order to gain acceptance and cooperation. One way (we did this) was through the IEEE. The other one was through the NFPA 70B Electrical Equipment Maintenance Committee. Finally, we were able to apply this new technology or tool to our critical sites. We took the most difficult sites in the Department of Defense, and we applied RCM to lower the costs.
Today I can say “What’s important to your mission? Do you want six nines, or do you want three nines?” If they don’t know, it’s our collective responsibility to help them understand what it is that they want for their particular manufacturing process, business, or whatever it is. The only way to do this is through the standards. This is how data centers will be affected by NEC Article 708. They will become more reliable by applying the knowledge and risk analysis tools we developed in the Army Corps of Engineers (utilizing the data that we collected). We (the government) had to put that information in the NFPA NEC to give industry and government in general a door or opening, or hook, if you will. It was our same approach with the IEEE Gold Book.
Manos: I totally agree that driving towards standards is key. But here’s where I’m conflicted, because I believe in standards. Groups like the Green Grid and Uptime Institute and others that are trying to drive standards in a sort of what to do way. There’s a lot of what to do. I look around this table and see that everyone here has a fairly significant organization, which can figure out the next step-how to do it.’
But when you get to the next tier down to that small- to mid-tier group of data center operators, there’s a huge gap. They say, “We understand this is what we have to do, but how do we get there? How do we do that?” There’s not a lot of information around on how to do it.
We have to look at that lower tier as well and help guide those folks through that process.
Heslin: The EPA is driving a metrics program and Digital Realty is a pretty sophisticated company, yet Digital suggested that it couldn’t offer all the data to the EPA that it wanted, only some of it. Is that going to be a barrier as we go forward with the data?
Martin:Our current model is to meter at three locations. We meter at the mains. We meter ahead of the UPS, and then we’ll meter at the PDU. That’s where we stop because we leave it up to the customer to decide where they want to run their power.
Stoyas: Can I add something on how to do it? These are the 16 TMs (Technical Manuals) that we produced in the last three years. I have six just on reliability. The theory. The how to do the analysis. How to collect data. The data itself. One on RCM. And then we’re coming out with the failure modes and effects analysis. I think it’s important that you know this is available. Taxpayer money paid for this, and it’s very efficiently done by a group of seven people (US Army Corps of Engineers), with the help of some very good contractors. We’re getting to the point now where the information is there and we have to start using it.
Soroka:I’m a real advocate and supporter of RCM, when it’s applied correctly. Yet I did a six-hour tutorial on Monday, and when I asked the audience, “How many people have monitoring systems?” Less than 20 percent said yes.
So we have to remember that as we apply standards and look at standards that there are companies out there that don’t have full-blown monitoring systems. You cannot apply an RCM system effectively without having metrics.
Manos: I’m not arguing that the material isn’t out there. In most cases, an IT person has inherited the data center operations, and he has no idea how to properly support the facility or how to maintain it. He can read those documents, but he may as well be reading something in a language he doesn’t understand, because he’s not going to get it.
Neville:I think you hit the nail right on the head. At DB we’ve acquired several smaller companies over the years. When we integrate one of these companies into our infrastructure systems, we notice that the practices that each follows is completely different from the others. They’ve gone with what they think is the best route, but in many cases it’s not.
While documents like NFPA 70B and NETA maintenance standards have tremendous value they aren’t tailored to data center environments and they don’t address the data center from a design standpoint at all.
Heslin: We’ve been talking about the needs of small and less sophisticated users, and yet NFPA 708 seems to indicate that we’re going to apply and define critical operations power systems more broadly and to more small and less sophisticated users.
Stoyas:There are different scales. Deutsche Bank has a different scale than the guy down the street who is doing a little payroll job for the 7-11. The 7-11 doesn’t need quite the redundancy. When we worked on NEC 708, we knew this because we had inspectors on the committee. We also had people from various industrial sectors. And, of course, the NEC is intended for all users, and it should be simple enough where everyone can apply it. So we have the authority to have jurisdiction provide the answer. But let’s face it. If you live in New York, and you have all these sophisticated operations going on there, I think the authority with jurisdiction is going to be a little bit more demanding than the little 5,000 person town.
Schafer:Let’s throw out an example. XYZ company is going to build a 200,000-square-foot data center in Smithtown. They have local building inspectors. They don’t know how to classify a data center to begin with, let alone how to analyze the NEC. They’re used to dealing with schools, hospitals, and apartment buildings. So you’ve got the jurisdictional governing body that can make your operation dead in the water. But the big picture, you’ve got to figure a way to communicate to the layman what this really means, and that’s through all the other organizations, whether it’s a private group that gets together, or whether it’s AFCOM, 7x24, Uptime, to educate the building inspectors.
Schuerger:I’ve been a member of the International Association of Electrical Inspectors for a long time and have been involved in a little bit of the Code process. The electrical inspector’s not trying to enforce something that isn’t doable. He just is trying to be prudent in what he does enforce. So this is the opportunity for evolution. A lot of the electrical inspectors are exactly like he described. They’re not experts in the area, but they’re also not stupid people.
Neville:I am a member on the New York City Code interpretation committee, and right now we’re looking at the 2008 NEC and making decisions as to which changes in 2008 we’re going to recommend be accepted and which ones we would not recommend. The very specific requirements that are outlined in NEC 708 bring real-world problems. Some members of the interpretation committee have concerns with being too specific on some of these issues. The real concern is how will these rules apply in a New York City environment.