Having been in the technology business for 30 years, I continue to be amazed when I come upon a data center with no centralized monitoring solution. Believe me, there are still such organizations out there, and many are large businesses where this type of automation should be crucial.
Performing mission critical tasks without monitoring is like running in the dark and expecting not to hit something. Designing a data center without a monitoring tool is like designing a car without a dashboard. Monitoring allows you to identify the assets in your data center, keep track of where they are, and what they’re doing. It is the foundation for operating a sound mission critical facility. If a monitoring solution is in place, the most basic layer of your foundation is built on solid ground, upon which you can confidently build your future growth and optimization.
Author H. James Harrington (Business Process Improvement) put it best, “Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” And even once you make improvements, how do you know if they’ve been successful?
Simply put, a solid data center infrastructure management (DCIM) monitoring solution is well worth the effort and expense for all data centers, regardless of size. While in the past, a DCIM solution was considered a “nice to have” asset. In today’s ever-more-sophisticated mission critical facilities, it’s a “must have.” The bigger the data center, the more mission critical the operations, and the more locations you are managing, the stronger the “must have.” A good DCIM implementation will quickly pay for itself in terms of reduced power consumption, more effective thermal management, streamlined operational efficiency, increased uptime/SLAs, and squeezing out more capacity from existing space, power, and cooling infrastructure. DCIM tools meet the top challenges of most IT and facilities users and they answer the following concerns of most data center operators, both owner-operators and third-party multi-tenant data center or co-lo operators.
QUESTIONS AND ANSWERS
What is DCIM monitoring?
A DCIM monitoring solution is a software tool that consolidates a myriad of measurements from across the data center, particularly power and temperature, and from both the facilities and the IT equipment. Furthermore, it normalizes and presents all this data in a common “single pane of glass” so data center staff, facilities, IT, IT operations, and executive management can fully understand what is going on inside and outside their facility, both in real-time and historically.
We’ve already got a BMS, EPMS, battery monitors, SCADA system, intelligent power strips, etc. What the heck do I need one more system for?
Consider this: You may have all the monitoring systems running in your data center, but do all of them talk to one another? My guess is, probably not! Why is that important? In a complex system like a data center, a holistic, big-picture view is key to making sound business decisions.
More importantly, because all systems are interconnected, installing a bunch of new blade servers will most definitely impact the temperature in the area where the servers are installed and will add to the energy consumption. There are factors other than available space, power, and cooling that come into play — IT cares about the applications that run on these servers, and the placement relative to storage and other applications. An optimal solution from a facilities perspective may be sub-optimal for IT. And an optimal placement from an IT perspective may result in an overload on the facilities side. The more IT and facilities share operational data, the more the solution can be optimized across multiple requirements and constraints.
Enter a DCIM monitoring system. If you’ve already invested a lot of time and money in installing all those other monitoring systems, don’t get rid of them, just bring all the information together with a DCIM solution normalized into a single, consistent format, so it’s easy to read, interpret and predict with a degree of certainty where your assets will fit best.
We’ve got a DCIM system. It came free with the hardware.
This is usually a case of “you get what you pay for.” Does that monitoring system speak only to the hardware with which it was bundled? If this is the case, then that’s not a complete system; it is just one more “island of monitoring.” What about everything else around it? But don’t get rid of what you got for free, just make sure your DCIM monitoring system can interoperate, talk to it, and collect its data with all the other information you need. A true DCIM monitoring tool is the integrator of all your existing islands, and connector to any other devices and sensors.
This is a lot of money. Prove to me that it’s worth it. What’s the ROI?
A DCIM monitoring system offers tremendous and measureable value. As soon as it’s installed and working, it becomes the platform for projects that save you money. And if you choose the right one, and use the data, the payback of your initial investment can be a matter of months — an EMA study of major DCIM installations has documented the substantial savings. Download a copy of the study at http://www.fieldviewsolutions.com/roi/.
For example: a good temperature monitoring system can offer a quick ROI. By monitoring temperature at various points in your data centers via wired or wireless sensors, you gain a real-time view of your data center’s temperature so you can rearrange assets, shift workloads, move around perforated tiles, or adjust fans to eliminate hot and cold spots and normalize the temperature. This transformation is done in a controlled and safe manner due to real-time monitoring. You raise the average a half degree or degree at a time, make your adjustments, wait for the system to stabilize, then monitor the system carefully for several days. If all’s well, then raise the temperature again, lather, rinse, repeat. The EMA study documents users who were able to raise the temperature safely, and for every increased degree Celsius in temperature, they saved about 4% in overall energy costs. Now that’s a savings you can’t ignore. The research documented an average power reduction of 15% in data centers who used DCIM monitoring to raise their effective temperatures.
Beyond savings, you keep catastrophic fluctuations in temperature at bay, and increase the reliability of your data centers.
With a real-time capacity view of available space, power, and cooling, you can add assets in a lot less time. Consider this scenario: before deploying a DCIM monitoring solution, in one particular bank, the time required to place a new server required 14 working days of coordination between IT and facilities. With a DCIM monitoring system in place, that task is now completed in an average of two hours. That’s two weeks earlier that the asset is up and running, generating revenue, and is a substantial boost in productivity of facilities, IT operations, and IT staff.
And, if you think you’re out of capacity — whether power, cooling, or space — a DCIM monitoring solution identifies “stranded” unused capacity so you can use your existing facility more efficiently — and safely, since you are using actual power loads and not “nameplate” or “de-rated nameplate” information in order to allocate your power. And with all this information in your hands, you may just be able to delay that expansion project or new data center buildout — or even eliminate it all together. That’s huge potential savings. One data center was able to cancel a planned $400 million new data center buildout because enough unused capacity was found, and enough efficiencies identified, among the infrastructure already in place.
Raise your temperature safely, save power, place assets more quickly, delay CapEx, and boost reliability. How’s that for a ROI?
I don’t even know where to start. What should we monitor?
Monitor everything you can, and build up incrementally. When it comes to data center monitoring, if you have the right tool, there’s no such thing as collecting TMI (too much information), but don’t feel like you need to boil the ocean all at once. Start with what you have and expand by adding sensors, and link in the various systems currently not being monitored. Monitor the temperature, the humidity, the fuel left in the back-up generator, the space left in your cabinets, the power — everything you need to know in order to manage and track the health of your facility. Monitor anything that can help you to make better business decisions now and enable better future planning. Just make sure when selecting a DCIM monitoring solution that it can handle all the data and is scalable enough to handle future growth.
Do you know how many different systems we have on our floor? How many different protocols they speak? How old some of that stuff is? One solution for all that? No way?
Yes way. Just choose a DCIM monitoring solution that isn’t tied to any specific vendor or equipment, and has built-in support for a variety of manufacturers, standards, and protocols. Then you’re good to go. Some vendors offer additional “drivers” or “connectors” for what they don’t yet support; that’s fundamental, but starting with the broadest possible existing set of protocols will give you a head start and save you money.
Also, select a DCIM that can help you determine which of those aging assets aren’t performing efficiently any more, or are under-utilized so you can either retire and/or consolidate some of them.
How can this help me with reliability? That’s my main concern.
Two words: alerts and alarms. You need a system that not only tells you when something is going wrong, but that gives you a heads-up when something might go wrong, or is headed in that direction. A system that will alert you that your teamed circuits will be in trouble if one fails over onto the other. A system that sounds an alarm when there’s a drastic temperature change or a particular measurement has gone outside of normal parameters.
This helps avoid a potential disaster before it materializes. Having a little warning before something major goes wrong is much better than having to mop up afterwards.
Your monitoring system should both push and pull. It should receive information from devices that turn out data and alerts, and interrogate devices by polling.
This is a big investment. How do I know it’s going to work?
Well, you don’t know, so when shopping for a DCIM monitoring solution ask for a “proof of concept” and references to existing, successful installations. Make a vendor prove their solution to you. If they’re confident in their ability, they shouldn’t object to providing you with a proof of concept. Make sure it’s a good fit for your facility and that it meets your specific needs and more importantly that it performs all the tasks the vendor claims. And while you’re at it, ask for references. Talk to the people who work with the solution every day, and get their perspective.
Finally, find out how long it usually takes to install, configure and go into operation, and what you have to provide (data, access, people, equipment, etc.) to make it happen. This will help you prepare in advance, and ensure a quick and smooth implementation.
Who in our organization can use this information?
“Who and what?” would be a better question.
Collecting the information is just the start of what a good DCIM monitoring tool does. It should analyze, sort, store, normalize, and present the data in a “single pane of glass” in ways that will be useful to all your stakeholders: from day to day users in IT, facilities and IT operations, up to senior management.
While some people use DCIM monitoring just in the facilities side of the house, it shines when it extends to the IT side.
Whether you need a high-level view or a drill-down to specific details view — an ideal DCIM monitoring solution should provide it.
In addition to your people, other applications and systems should have access to the data, too. CRAC/CRAE control, power control, IT service management, virtual machine hypervisor, dynamic cooling, power capping, load shedding systems, analytic tools, capacity planning tools, financial applications, and others.
And it’s important that your solution has the ability to share both real-time data - what’s going on right now, to keep you totally up-to-date — as well as historical data, which can be voluminous, so that you can look back, compare data, look at trends, learn from the past, and make sound business decisions for the future. A good DCIM will offer powerful APIs for accessing both real-time and historical data.
I haven’t got the budget for this.
It’s not whether you can afford to have a solution — it’s whether you can afford not to. Given the short payback periods and the potential benefits, a good business case can be presented to your CFO. And you may be able to purchase a DCIM on a monthly payment plan if cash flow is an issue. The up-front investment in money, as well as staff time, pays huge dividends long-term.
Determine the risks of downtime to answer the question whether you can do without a DCIM monitoring solution. Think about a DCIM monitoring solution in terms of investment. An investment in the future of your data center. Choose the right one and it’ll make your facility more efficient, most cost effective, and might even prolong its useful life. See how quickly it will pay for itself.