The Right Time For DCIM
A key component of today’s data center infrastructure management (DCIM) systems is gathering and analyzing live data associated with the data center. This can represent thousands of points of information such as temperature, power, capacity, or status of any number of devices, meters, or sensors throughout the data center. The collected DCIM information can easily venture into the “Big Data” realm with not only collection of information, but also storage of millions of samples of historical values.
As an industry term, DCIM has been convoluted over the years as multiple vendors use the same term to define significantly different feature sets. While DCIM is taking a more defined shape, the term “real-time” in regards to data collection is in danger of falling into that same confusing realm for an enduser.
Our team recently heard an enduser say that their DCIM provider gave them real-time information as one sample each day. There were hundreds of thousands of data points and the software could only accommodate a single poll of each data point every day. Naturally, that enduser was disappointed and discouraged as their expectations of real-time data were far from what the vendor actually produced.
Another term beginning to be heard across the industry is “near-time,” and is a more accurate description of what most DCIM systems provide. Another popular term — with a separate meaning — is “extended interval.” At Geist we have worked hard to define these three terms in the following way:
- Real-time: a continuous sampling of data sets with a refresh cycle of seconds.
- Near-time: a sampling of data sets separated by more than a minute but less than one hour
- Extended interval: any sampling of data that is delivered less frequently than once per hour.
These three rates of refreshed information have their own distinct use cases, along with pros and cons for each. There isn’t a one-size-fits-all approach to collecting live data. The user’s need is the key driver to determine what data needs to be collected and at what rate.
THE BENEFITS OF REAL-TIME DATA
It might be best to illustrate the benefits of real-time data with a real-life case; a colocation provider that prior to the installation of DCIM had been manually logging their tenants’ power usage in extended intervals. Approximately four times per day they would take physical readings, record them in a spreadsheet, and then evaluate the spreadsheet monthly to ensure that the tenants were all staying within their power SLAs.
After deployment of an alternative DCIM system, they captured data on a real-time basis and then stored that data for historical review. At the end of the first month, the reports derived from their real-time system were quite astonishing. The original extended interval logging had gaps large enough that there were significant differences in what was reported the prior month with what was being reported through this new system. In the end, the colocation provider realized they had several customers that were over-utilizing their prescribed capacities for power. As a result, they were able to renegotiate their service agreements and the cost of the DCIM implementation was recouped in a matter of months. Who says DCIM doesn’t have a tangible ROI?
Beyond this short illustration, real-time data collection has many benefits.
- Warnings and alarms. With data refreshed within seconds, users can be alerted to threatening situations and react quickly. Real-time information may help them see issues before they become problematic, allowing the operator to move from reactive into a more predictive management state.
- Highest accuracy of data. With frequent polling comes the opportunity to store additional detailed historical information for use in data analysis. A high sample rate ensures that quick spikes and sags in readings are captured.
- Reporting and trend analysis. Real-time information provides an increased level of detail when it comes to reporting and identifying trends. The data center environment can change quickly and having a higher data refresh rate ensures that the user sees the entire picture.
- Validation of capacities. A database of devices and their anticipated power draw is included in most DCIM systems today. Real-time data allows the user to utilize the most precise data to validate their nameplate or de-rated assumptions to ensure maximum usage to their full capacities.
- Operational awareness. Data center operators can frequently be seen entering the critical environment to take readings, assess an audible alarm, or to just generally evaluate the status of the site. Having real-time information accessible through their DCIM system allows access to that information in a more convenient and holistic way, giving greater understanding into many aspects of their operations.
THE DRAWBACKS TO REAL-TIME DATA
- Cost of implementation. It takes a significant amount of processing to collect and manage all of that real-time information, translating into higher implementation and system costs.
- Data overload. It is important that a real-time data collection tool has intelligent and simple ways to make sense of all of the collected information. Good user interfaces, graphical representations, and reporting engines are a must to avoid information overload.
- Extended network and processing resources. Big Data brings with it the challenge of passing vast amounts of information across LANs and WANs as well as processing and storing all the data collected. An efficient tool needs to be harnessed to ensure performance of the application remains high without degrading other systems in the process.
WHEN NEAR-TIME DATA IS HELPFUL
Near-time data can be somewhat less taxing for a system to collect and manage and can provide a number of benefits to DCIM users.
- Validation of capacities. While it may not have the same number of samples as provided by real-time data, when collected at reasonable near-time intervals data can provide valuable insight into actual readings and associated trends that can be used to validate assumptions made in modeling capacities.
- Replacement of “sneaker reports.” We see many organizations that still use technicians to walk the data center floor and take manual readings at defined intervals. Because those types of reports are completed on a somewhat infrequent basis, near-time data can provide at least a one-for-one replacement and free up an employee’s time to work on more productive tasks.
- General planning and architecture. Near-time data can be adequate when high-frequency operational awareness is not required, but when general planning and visibility is sought. A lot of data can still be gleaned from a poll rate of 15 minutes that will provide “accurate enough” information to aid planning and data center growth decisions.
THE DIFFERENCE BETWEEN REAL-TIME AND NEAR-TIME
Real-time data collection and near-time data collection have many of the same benefits, but there are certain operational elements that are not available when using a near-time rate. Some of those could include:
- Delayed warnings and alarms
- Failure to capture short bursts or periodic changes in-between polling cycles
- Not enough detail to fully examine an event
The main difference between the two polling rates is the effect on operational awareness. As an example, if the poll cycle is every 15 minutes, and a 10-minute power outage occurs, the ability to collect information about how the load transferred and returned to normal, how the temperatures were affected, and generally review the entire event is simply not possible.
When monitoring power specifically, a near-time polling cycle can easily miss spikes and sags or simple deviations in workloads that can change rapidly.
If operational awareness and greater in-depth analysis of events is a critical factor to the success of the DCIM system, near-time data collection is likely not the answer. Real-time polling provides the granularity of information needed for those technicians that are responsible for continuous equipment operation.
WHEN IS EXTENDED INTERVAL RIGHT FOR ME?
Extended interval polling is a very sporadic collection of information. This kind of data would be more useful at a macro level. For instance, having a daily sample can give good information into rounded readings like max megawatts utilized. During the course of a normal day, there is too much variation in power readings to put much stock in a single time sample.
A good use case for extended interval would be for global capacity planning. An executive level user could be tasked with determining when to build a new data center or when to consider collocating. A small number of infrequent samples could provide a “close enough” picture of the power footprint across an organization for the executive to start planning conversations.
Technicians, 24/7 staff, and even managers will be left wanting for more information as they attend to their daily duties in an extended interval rate. So, in summary, extended interval is really only effective for high-level planning.
CONCLUSION: YOUR TIME IS THE RIGHT TIME
The point is that there is no single live data-polling rate that is best for everyone. However, there is a right polling rate for each job title group within the data center.
Technicians, operators, NOC staff, and those responsible for the daily operations of a data center, will likely find a real-time data collection system most beneficial. It provides the highest degree of operational awareness and the ability to complete post-mortem analysis on past events. The other polling rates cannot provide nearly the level of information required by this group as real-time.
Near-time polling rates are great for those responsible for detailed planning and reporting. Generally, this responsibility resides with data center, IT, or facility managers who have a continuing need to analyze capacities when deploying new equipment and planning for future equipment. These managers may not need the same level of operational awareness such as the instantaneous alarming or power quality capture that comes with real-time levels. However, near-time gives them a very nice window into how power flows throughout the day and the effect that has on their working environment.
Data center operators won’t have a lot of use for extended interval polling. There simply isn’t enough granularity to be of benefit to the reactive decisions and actions they must take. Extended interval is a reasonable fit for the executive level group who are more interested in generalities or data across lots of sites. Having infrequent measurements still gives them enough data to make high level decisions that can then be passed down to the managers for greater evaluation.
In the end, it is most important to establish the business needs first. Who is using the system? What are they using it for? What are the goals of the system? What data needs to be collected to accomplish those goals? If the right scope of work is defined at the outset of the project, obtaining a system that has the appropriate level of data polling will be simplified. There is a “right choice” for data acquisition frequency and what that is depends on who is using DCIM.
As a co-creator of Geist’s DCiM solutions, Matt Lane has over 14 years of experience working in data center monitoring and product development. He brings a wide range of experience as an entrepreneur, business owner, and manager. He is currently the president of Geist’s DCiM division which provides customized solutions for data center monitoring.