Data center infrastructure management (DCIM) software was all the rage a few years ago. Data centers need a means to accurately measure power, cooling, and critical infrastructure assets to be able to better manage consumption. They also need a means to tie in the demand of the compute hardware to manage the overall data center ecosystem. That’s a rather tall order in some environments, and, while it sounds great, some DCIM solutions are sitting on a shelf, wasting space. Purchased and implemented are not the same. So why are these expensive software programs collecting dust?
The first thing to understand about DCIM is that there is time involved in setting up the software and hardware. Some solutions are software only, while some have interfaces to intelligent hardware. In-rack power distribution units (PDUs) — or intelligent power strips — were some of the first facilities equipment to gain intelligence. At the start, the intelligence not only understood equipment power consumption but also ensured that human error didn’t plug both primary and secondary power into the primary leg or, on a grander scale, phase imbalance for three phase systems, for instance.
Early software versions were vendor-specific to the hardware and, in many cases, locked users into single vendor hardware solutions, as there was no interface to open systems or other manufacturers’ hardware. The fact that they were vendor-specific also made integration with HVAC monitoring and equipment monitoring impossible without adding gateways, manual entry of data, or some means to convert the data. The HVAC, chillers, and environmental packages were not always on open systems either. So, oftentimes, the calculations between them were manual or sitting on a spreadsheet.
When data centers are near or at capacity, it is a constant struggle to maintain balance across the floor. New assets would change the numbers and “gut feel,” and open ports simply don’t provide enough information to manage the capacity complexity. You can’t fix what you can’t measure, and, if you don’t measure, you can’t evaluate the effectiveness of the fix. Power usage effectiveness (PUE) became a number that data centers chased; the measurements determine that the number is misunderstood and, in some cases, outright abused. For any efficiency quotient to be accurate, the input must be accurate. Spreadsheets are not the best means to keep up with assets. Furthermore, any calculation is a snapshot in time. Data center managers need to be able to trend over time. To do that, we need statistical and historical information from compute hardware and peripherals.
Fast-forward, and DCIM was born. There is certainly no shortage of the number of DCIM packages available, and the functionality they provide varies greatly amongst vendors. But why are so many of these packages collecting dust on end users’ shelves after purchase?
One barrier to functional use is the amount of time it takes to load data center assets, triggers, and alarm parameters into the systems. Auto-discovery can only go so far in identifying assets. It will not tell you where on the floor the asset is located. The latter is a critical piece of information for every asset. Plus, not all equipment is discoverable, meaning that a lot of input is needed.
In fact, there is an entire industry sector that provides services to populate DCIM software and set it up for use. This removes the need for personnel to split time between their regular 40-hour-a-week job and trying to input information into software. These companies can go even further, and many will provide audits of assets, their locations, and inventory functions concurrent with setup.
Documentation and input are only as good as the information provided, and the software is only as good as the information input. That said, the amount of intelligence these packages can provide across the data center floor is invaluable. This is true whether you are in your own data center or in a colocation facility that allows access to physical hardware resources needed for management. Some colocation facilities provide their own monitoring services for power and cooling but stop shy of servers, storage, switches, and other compute assets.
Once these packages are set up, equipment decisions take a new path as it becomes possible to determine CPU/power utilization and storage power effectiveness to be able to orchestrate loads based on power rates. Quite simply, the packages tend to pay for themselves over time. You just need to be sure to budget either the resources or time to get the first input loaded and the system configured. The system needs to weed out the “noise” data and provide meaningful, actionable data to be valuable.
It pays to do some homework on the amount of time that will be required based on your environment. When in doubt, issue a request for information (RFI) to be sure that you are getting what you want out of the system. If you are in someone else’s space (colo), be sure that they allow this level of monitoring and determine if they have some of the information that you can use. It won’t do you much good if they don’t allow you to capture that data from their systems. You will be limited in what you can discern. Remember, it is never a bad thing to reach out to someone that can provide you a time saving road map.