As business enterprises become increasingly virtualized and cloud-based, silos are toppling and traditional hardline boundaries between the data center and the rest of the organization are becoming mere smudges. This phenomenon was recently outlined by The 451 Group, which cites a convergence of domains, capabilities, and portfolios of capacities deriving from the fact that domains of business, technology, and facilities are merging to create new service and delivery capabilities, relying on an array of owned, leased, or cloud-provided capacities.
- DCIM and the Modern Data Center
- Strategic Energy Management
- DCIM for the Top of the Stack
- The Value of Open-Platform DCIM
- Condition-Based Maintenance
- A Modular Approach to Automation
Central to this convergence within today’s digital enterprise is the need to monitor, anticipate, and control changes in the enterprise environment — both the physical facilities and data center operations. Increasingly, IT departments are depending on data center infrastructure management (DCIM) tools to ensure improved uptime for their systems and to find the best path toward energy savings and better utilization.
Just as the enterprise has become more sophisticated in its application of digital technology, DCIM has evolved as a critical element in both traditional and cloud-service data center operations. Until relatively recently, data center automation platforms had been an outgrowth of building management systems (BMS) and had encompassed a variety of unconnected point solutions, with devices and software designed to monitor a limited set of parameters. One solution might examine the facility’s temperature and humidity to begin to control those factors in the server room. A simple spreadsheet might be used to track the deployment of servers seeking to maximize the efficient use of space or power.
When businesses began building larger data centers, they tried using the same tools to monitor increasingly numerous and sophisticated instruments across mechanical and electrical subsystems, with larger numbers of more complex server systems, each running multiple virtual environments or distributed applications. As the number of sensors and control points started to grow, traditional BMS-style systems ran into trouble. When the prospect arose of looking at on-board server parameters —temperature, power, and CPU utilization — it became clear that more sophisticated monitoring and control systems would be required.
Systems for power management ran into similar barriers. Typically, a series of meters would be placed throughout the data center to measure main incoming power along with the number of kilowatts being drawn by equipment inside and outside the IT room. These gross measurements have been useful only to an extent. Traditional metrics like power usage effectiveness (PUE), which do not take into account of CPU utilization, are gradually being displaced by true end-to-end metrics, such as cost per transaction. This dictates that a single converged system has visibility from the power grid connection and mechanical systems through the physical servers and networks, right up the IT stack to the application layer. This is the realm of the modern data center automation platform, incorporating DCIM and its associated automation platforms.
The need for a more holistic approach to data center automation is reflected in a trend toward convergence of energy management and IT applications. The traditional view of energy management focuses on PUE, measuring the amount of energy drawn by servers vs. the energy entering the building. Today’s digital enterprise, however, requires the adoption of strategic energy management techniques in which technology sees where every watt of power is going and how efficiently it is being consumed, along with where that power originates — in the grid or from local renewable energy sources. DCIM should be smart enough to take advantage of rebates and incentives offered by utility companies for modulating power usage during peak periods. Moreover, enterprises with photovoltaic, fuel cell, wind, or other renewable and emergency power generation sources have the ability to sell excess power back to the grid, and DCIM should help the data center capitalize on such energy to offset expenses. Often these local sources produce enough excess energy to power hundreds or thousands of homes and actually can forestall the need to build a new power plant.
With a comprehensive automation solution and strategic energy management, operators can schedule each of the enterprise’s alternative energy sources based on the data center’s needs, determining when power loads should be shifted to the grid to handle big demands and when the center can run on its own systems.
Of even greater importance to cloud architects, virtualization engineers, data center managers, and hosting providers is the ability to use data center automation further up the IT stack. Until now, DCIM largely has resided near the bottom of the stack, monitoring the capacity of physical servers and networks, and mechanical and electrical infrastructure. Today, comprehensive data center automation tools operate across the entire stack to help manage virtual machines, operating systems, and applications. This technology can, for example, see which applications are running on each VM, as well as which may have stopped, and alerts system owners if a VM or application needs attention.
Initially, virtualization applied at most to individual or clustered machines and was difficult to extend to the entire data center. After Amazon’s creation of the Elastic Cloud with availability zones, however, applications could be configured in multiple zones so that if one data center went down, the applications could run seamlessly in another. This strategy worked well unless the data center that went down was the one housing key routers. A failure there required that the routing tables be updated manually to divert to the still-active centers.
Google advanced the level of resilience in distributed execution software with its MapReduce approach, which separated every application into many fragments. Each of these fragments could be executed on any node in a large set of computing resources. Culminating in the Hadoop framework, this approach now allows distributed computing across a large fleet of servers and dynamically allocates jobs to each of them.
With such new strategic methodologies, shifting loads between data centers has become a real, highly valuable opportunity for enterprises that employ the latest automation technology. For example, when operators can forecast the compute load at each of their data centers, they can shift those loads to other data centers anywhere in the world to take advantage of lower energy prices in particular geographies or at particular times. Thus they can keep utilization and efficiency as high as possible. While DCIM is not the only tool required, it is a critical and central piece of the solution. When combined with run book automation, it helps managers optimize their compute load against energy profiles.
With this capability, DCIM can eliminate the need for overprovisioning. Currently, many data centers have installed, for example extra UPS units and standby generation systems — two times or more the capacity that they need for normal operation — because managers do not have enough faith in their data center’s mechanical and electrical systems to run critical infrastructure. Some have created an entire “spare” data center as a disaster recovery facility that runs 24 hours a day, just waiting for the one time that a fault occurs. Now, however, it is becoming easier for managers to simply move operations to whatever data centers are up and running. They no longer need to bear the considerable cost burden of doubled-up back-up capacity.
The true value of DCIM and data center automation in general today is its ability to monitor what is occurring at an operational level within the entire stack, reporting on what is happening in each server. Unfortunately, however, many DCIM systems are designed to manage only a single vendor’s hardware. A cloud-computing data center is likely to house a diverse array of equipment from many different vendors, and it has been difficult until now to find one tool that can look at all the servers.
DCIM also plays a crucial role in data center maintenance, preventing incidents from becoming outages. By delivering the right information at the right time, DCIM helps ensure that those responding to problems with a server or uninterruptible power supply know what to do without making matters worse.
Rather than launching scores of alarms, advanced DCIM solutions identify the root-cause alarm, recommend what should be done, furnish emergency operations procedures to restore the system to a healthy state, and even enable data center operators to connect with a subject-matter expert to talk through the fix.
As a result, the most advanced DCIM technologies change the overall approach to maintenance. Instead of doing calendar-based maintenance periodically, DCIM enables condition-based maintenance, centered on monitoring of each system’s performance so that any emerging problems can be repaired before the system raises an alarm.
One of the important aspects of today’s most advanced data center automation technologies is their modularity. In the past, a data center manager who had installed, perhaps, a solution for asset management may later discover he also needed power, energy, and IT monitoring. The point solution with which he chose to start may not expand to cover those functions. If it’s not scalable or upgradable, the system inhibits the operation from moving up the data center maturity model. Choosing a closed DCIM system that works only with certain equipment and servers often limits later choices.
Automation technology on the other hand, begins with a comprehensive basic solution to which modules can be added. An open platform can communicate and scale with technology from any manufacturer.
Within a converging cloud-based enterprise, the data center assumes a commanding role that leads not just technology adoption, but also facility functionality, geographic expansion, and overall business development. The right automation technology can help ensure that data center managers lead their organizations toward successful growth strategies and efficiencies.
This article was originally posted “Managing The Cloud-Based Enterprise With Data Center Automation” from Cloud Strategy Magazine.