A Meeting of the Minds (Part 2)
November 7, 2008
Editor’s note: Mission Critical and ASCO Power Technologies collaborated to hold a Technology Roundtable on Power Reliability in the Data Center. The event was held April 2nd at the MGM Grand in Las Vegas, NV. Mission Critical would like to thank Bhavesh Patel of director of Marketing for ASCO for his help organizing this event. Armand Visioli, president of ASCO Power Technologies, introduced the group. Please visit our web site to see Part 1.
2N or Not 2NHeslin: We’ve discussed Tiering and reliability before, but how does a business know when it needs to start looking at 2N or other high redundancies?
Neville: It’s really a business decision as to what risk exposure we have in a particular facility. If we were constructing a blade server farm or any grid computing area, we would question if we really need 2N infrastructure systems in that environment. If we were constructing a main data center, we would probably conclude that we need that reliability. I think it comes down to what Angie (Stoyas) was talking about, risk assessment.
Stoyas: We don’t use Tier I, Tier II, or The Uptime Institute stuff in the Department of Defense. We go by the number of nines, and we ask what you will accept as downtime. How much can you stand, in terms of economic loss and you have to define what that means. That’s 32 seconds in a whole year, for example.
Heslin: How can we be sure, when we’re going through a process of designing reliability in a data center, that we’re making good value-based judgments?
Emert: Fundamentally, these are all business decisions. Most of the IT guys bring in a piece of equipment and say, “Well this has five nines, or six nines on this equipment. I want my data center infrastructure to match what I’ve got installed in my rack.” And they don’t really appreciate the economics that drive the infrastructure to get to that same level of reliability.
Neville: I’ve been there where someone says, “I want five nines in my infrastructure,” and you agree. In the following meeting, the costs for five nines are discussed and the individual who made the request no longer wants five nines. They only need three nines. I think it always comes down to having a standard based upon sound engineering and business principles.
Martin: We try to look strategically at how to plan future data centers. We’ve just gone ahead and put in some simple solutions, additional conduits, some pipes, cap them off, and maybe add transformer pads. Or, we have agreements with the local utilities so that two years, three years down the road we can upgrade to Tier IV at a cheaper cost because we’ve already done some of the infrastructure cost builds ahead of time.
We’ve already got 30 generators that we’ve paid for in the queue, so we’re going to get those delivered. We don’t have a spot for them, but at least we’re trying to be ahead of the curve.
Soroka: We see a wide range of customers, and they’re like an inverted bell curve. We all have to be extremely careful to remember when we talk about issues that we’re on one side of the curve, at the high end. There are a lot more people at the lower end of curve; N+1 is all they need, and that’s all they’re ever going to do. There are a lot of users who only require N capability.
Kalra: The fundamental problem that I see with the entire industry, with every single one of us here, including myself, is that we don’t talk to people enough. We don’t talk to our customers enough. You told me what I need, but you haven’t convinced me why I need what you told me I need. What does it mean to me in terms of total cost of ownership? What does it mean in terms of difference in relative reliability?
Digital Realty Trust will tell you, “We’ll give the customers what they need.” But they haven’t gone on to say, “In our experience, this is what we’ve seen a customer like yourself do and you might want to think about that.”
All of us here are very smart about construction, engineering, data centers. But none of us knows enough about data center, IT, and a business interaction.
Manos: I think it’s tied to the application space that you live in. Today it almost breaks down into an organizational structure challenge. In our organization today, I do have all the facilities and I do have all the IT people reporting in from my organization. So that has allowed me to build a lot of the synergies that you talked about.
Elsewhere there are very specific camps. There are the facilities camps and the IT camps, and I think broadly across the industry, you find that that is exactly the case. But there are definitely many cases of organizations that are looking at how to combine these organizations.
Sawyer: I’m seeing the less sophisticated users need to be more educated. And a lot of time, scheduling really plays into a part of that. It’s the consulting engineer’s responsibility to explain to them what they’re buying, what they’re getting, and if they choose to reduce their redundancy, how that impacts their business. But I think a lot of times, when we get the call they’re already overloaded on their UPS and they’ve waited months to make the decision to go ahead, and build and expand, and they want that data center right away. The planning involved was not necessarily thought through, so they called the consulting engineer and said, “We need to get this up right away. Otherwise our business is going to be impacted.”
Confusion at the Top?Heslin: Why is it that the split between IT and facilities exists even at the highest levels of sophistication? How do you bring those two together in a way that makes sense?
Kalra: I think traditionally the facilities world and the IT world have been somewhat disengaged. IT will go to facilities folks and say, “This is what we need. We’re thinking about building a data center.” Maybe they’ll do some rough analysis on how much power, space, and cooling they need. But traditionally, beyond that they leave it up to the facilities folks to deliver what they think they need, without really explaining to them what type of reliability levels they’re looking for, what the business criticality of that data center is, and what type of equipment is going to be in that data center.
More and more companies are starting to take the steps of putting their services or facilities people and IT people into the same organization, closer to facilities and vice versa.
The next step is for facilities and IT folks to jointly have a conversation with the business folks to understand the business perspective and for business to understand the IT and the facilities perspective. The business can ramp up and down almost overnight or over the weekend. IT can ramp up and down a little bit faster than facilities, but facility takes forever to ramp up and down.
Stoyas: On the DoD we used monitoring to address this split. For the IT operators operating the system, we put a monitoring system for the power and mechanical in with the computer monitoring system. That person (IT) now became aware that there’s a whole other world that he relies on.
Schafer: We’ve been ignoring the world Steve (Emert, Rosendin Electric) lives in. IT can talk and the facility can talk, but then they want Steve to magically make this all come together without disrupting anything, without creating any dust. A lot of times the contractor doesn’t get involved until they’ve made decisions that make his job impossible.
Kalra: If the facility person understood Steve’s challenges, he’d be able to bring that into the conversation.
Schafer: Whether it’s the small thousand square foot or 200,000 square foot facility, we like to lock in during strategy meetings, where you’re bringing all the players that are going to have an input.
Set clear understandings, goals, and objectives, so everyone knows from day one this is where we’re going. This is the time to say, yes, we want Tier IV, Tier II. We need 2N on this. We don’t on this. This is the risk versus reward. You make that the basis of the design document together with all the players and all the cardholders at the table and you get written signoffs. And you don’t go to the next step of design until you walk out of that meeting with clear expectations and goals.
Soroka: All my presentations always talk about requirements analysis and basis design. The requirements analysis comes from the IT side saying, “Business is going to change overnight, but as I see now here’s my three- to five-year outlook.” The basis design document says, “This is what we design.”
But often when I go out and commission a facility, I ask, “Let me see your basis design. Let me see your owner’s project requirements.” I find out these don’t exist. It’s amazing.
Schafer: When you’re buying 12, 15, 18 engines, you have to get your orders in because it’s 50 weeks to get those items. You have to come to those decisions early in the design and planning process.
Martin: One thing that really helped us, with Highland’s help and a couple of good engineering firms-we have standard design documents that specify everything for the UPS that we’re going to use, preferred providers to the paint on the wall, and these are what we build by. Then we have a gating process: Okay, this is a good property. Okay, everybody’s has signed off on it. They approve of the location. The portfolio manager’s got to pay for the P&L and signs off. Operations, we can manage it. Then it goes onto the next gate.
Stoyas: In the government, let’s say Congress allocated so much money. Someone thinks, “Okay, let me get this really mission critical equipment first, and then I’ll think about the rest later.” Then they get this really sophisticated equipment. Now you’re faced with an imbalance. If you see these government facilities, you see this hodgepodge of additions and nothing works together. All of a sudden, now I need billions of dollars to redo everything and make sure everything communicates.
Neville: The way we run it is we put the basis of design together, and then we price and put a budget estimate together. I think if you try to do the budget estimates before you do the basis of design, the cart is before the horse, and you end up value engineering off what you think you really need.
Kalra: I have come across that in the past. While in the exploration stages, we actually prepared a 50-60 page project assumptions and project requirements document. But the business people are not going to understand that document. So before that document, we prepared what we called the assumptions document.
I say to everybody, “These are the baseline assumptions. Whether this is what we build or not we can still debate, but these are the baseline assumptions that we’re going to use to prepare our schedule and a budget.”
In my mind, the scope, schedule, and budget are the three corners of a triangle. They all have to tie together. If you pull on one corner of the triangle, it’s going to change the shape of the triangle. If you change the scope, it’s going to have a budget impact and a schedule impact. If you want it done quickly, we’ll change the scope and maybe we can bring it in quickly. If you don’t want to spend a certain amount of money, maybe we have to make some trade-offs on reliability.
Energy CostHeslin: These comments seem to a focus on first cost.
Where and how does energy and energy consumption come into this discussion, and what are we prepared to do to make energy efficiency part of the power reliability discussion in data centers?
Manos: When you think about the upfront costs, it’s about 11 percent of the total cost of ownership. Power costs, especially in the larger facilities that we’re building, are pretty significant. It’s been proven that the lifetime power costs of a server far exceed the cost of the one new server at this point. It’s a fairly big change and it’s something we are already adopting into how we manage our business. Our data center managers are compensated and measured based on uptime, efficiency, and utilization of the facilities for which they’re responsible. It’s driving some unique responses from those folks.
Schuerger: Actually in the last couple of years, power has been a bigger driver than reliability, and what you’re seeing is more of a shift from the Tier IV to Tier III, because the Tier III level gives you a better trade-off for reliability versus efficiency. If you look at a 2N, if you’re 100 percent matched and everything is perfectly balanced, the best you’re ever going to do is 50 percent. If you go to a two out of three, you’re automatically at 66 percent. In the Internet hosting world, it’s a way bigger factor than reliability.
Sawyer: That’s very attractive for a lot of our clients. We’ve been designing this method for years. But I still disagree that efficiency is more important than reliability or that reliability is often compromised for efficiency.
Kalra: I’m actually starting to see the scale tipping more towards energy efficiency than reliability now, and one of my fears is that it’s going to tip to a point where we’re going to go from one extreme to the other. Five years ago, it used to be all about reliability. In the last two years or so, it seems to be all about energy efficiency. I think we still need to find that happy medium.
Kalra: I think the other problem is when people in one sector start to compare their statistics with people from a different sector without understanding the different business models. I have guys that come into my office all day long that say, “Intel is doing this,” or “Google is doing this for their data centers. Why can’t we try some of those things?” You can. But do you want to try that in your production data center? Or do you want to get comfortable with the technology first in a lab-type environment before you actually deploy it in a production environment.
Sawyer: I think it’s important from a marketing standpoint to be energy efficient right now. Some of our data center clients are already pretty energy efficient, and they get upset when their competitors make headlines, if they’ve dropped their PUE or become more efficient than before, when reality is the competitor is still far less efficient than the client even after the change.
Heslin: Would an Energy Star label be attractive for most data center operators? Or would reliability still, at the end of the day, have to trump that?
Neville: The key driver is overall cost. If we can run the data center more efficiently and keep reliability up to a certain level that the business is willing to accept, that’s the optimal case. If we can’t get the efficiency for the reliability we need, then we would look at focusing more on reliability. If we can remediate some of the business risk on the loss of that facility, then we can relax some of the reliability concerns and leverage this for increases in efficiency.
Manos: If the EPA wants to do an Energy Star data center program, we need to work with them. They will understand a lot of this complexity. Otherwise, what’s likely to happen is that they’ll come up with a set of metrics that you have to live by, and you’ll say, “Well, I’ll never live by them,” and that measurement becomes meaningless.
Martin: It’s risk: What you can afford to do. Of course everybody wants to be environmentally friendly, but it does boil down to asking yourself about your risks and costs. There are some easy things to do that can help you get to LEED certification. Low-hanging fruit, such as just sealing up the data center and closing those openings. Pull out that old cable or that server that’s sitting there idle, not really doing anything.
Kalra: Even within each reliability band or each tier band, there are certain ways to design a data center that are not as energy efficient as they could be, whether it’s in equipment selection, or your design or operating practices.
Heslin: How well do even more sophisticated users control of their entire facilities? Sometimes it seems like business units dictate and things happen outside the realm of what the IT people.
Soroka: We actually see a lot of LAN closets deployed in large corporations. They have all the sophistication going with the data center, but then they drop a LAN closet in for this floor or this building over there that ties back to the main data center. Typically the data center really isn’t involved in these little closets that pop up, and the next thing you know, you have all these little closets that pop up everywhere. Then it turns out that this closet actually has grown to a 1,000 or 1,200 square feet, and if it goes down, 10, 20, 30,000 people aren’t working because they lost access back to the data center. We’re also seeing that they’re starting to understand that these are very important and are actually employing the same technologies and reliabilities that they do in their data center.
Manos: Sometimes the IT guys will walk into a closet and say, “Oh, there’s a power outlet here. Let me go stick my access switch here,” and the facilities folks may not necessarily understand or even know of it. If you don’t have good communication between those groups, you can fall victim to that pretty easily.
Sawyer: Many times, the design teams for the LAN closets and data centers are totally different teams. The data center experts focus on the data center, and then you might have a tenant build team that’s inexperienced in reliability and unfamiliar with data center design techniques designing the LAN closets in the office space, so there’s a breakdown.
Neville: I think a key is change control. Those changes that occur in the data center and the LAN closets need to be transparent. If you are two separate groups, they’ve got to be transparent to both groups, so that everybody can see what’s going on and comment on the strategy.
Kalra: I’m starting to see a lot of convergence in the data center industry between IT, corporate services and, to a certain extent, business as well, but not enough. The industry is sort of going through an evolution phase similar to what we went way back when I first started my career, that centralized data centers are going to be a thing of the past because it’s all about distributed computing. It’s going to be all about LAN closets and things like that.
I know of many of our peers who have recently built large-scale data centers and others that are still building. Microsoft is doing it. Intel is doing it, etc. So the data centers haven’t gone away, just like the mainframes haven’t gone away.
But I am starting to see a lot more dialogue between facilities and IT people, which can only help make our lives easier.
We’ve discussed a lot of good ideas here. I think it’s incumbent upon us to take that back to our daily lives, and not forget those and start implementing some of those ideas in our own little world, whether it’s talking to your IT counterparts more or meeting with the business guys.
Emert: I just wanted to add, that if anything, we are still on the steep side of the curve as we build infrastructures to try to keep up with the IT equipment evolution. The innovations and technologies keep rapidly coming down the road. One of our tasks in design is trying to anticipate the future. I’m not sure if we’ll ever get to the flat part of the curve.
SIDEBAR: Meet the PanelistsSteve Emert, PE, Rosendin Electric, Engineering Team Leader
Sudhir Kalra, Morgan Stanley, Global Head, Enterprise Data Centers Engineering and Operations
Michael Manos, Microsoft Corporation, General Manager, Data Center Services
Ted Martin, Digital Realty Trust, Vice President, Technical Operations
Glen Neville, Deutsche Bank, Director, Engineering
Greg Sawyer, P.E., Burr Computer Environments, Electrical Engineer
Brian Schafer, Highland Associates, Director of Business Development and 7x24 Exchange Metro New York Chapter, President
Bob Schuerger, PE, EYP Mission Critical Facilities, Principal
Joseph Soroka, Total Site Solutions, Senior Vice President, Facilities Management
Evangelos Stoyas P.E. Power Systems Consulting Consultant, U.S. Army Corps of Engineers, retired.
Kevin Heslin, editor, Mission Critical, Roundtable Moderator