'Ghost Servers' And Wasted Data Center Power & Cooling
Left invisible, data center energy consumption includes power and cooling for idle and inefficient servers and NAS devices.
Last month, I wrote an article1 about three different approaches for gaining visibility and control of energy consumption in the data center: manual metering, modeling, and real-time management. Today, let’s look at why it is imperative that data center managers understand these approaches as they consider adding an appropriate energy management solution to their data center infrastructure management (DCIM) methodologies.
This somewhat playful term refers to those compute and storage systems that are idle, yet still drawing power and taxing cooling systems. In reality, there is nothing amusing about ghost servers. They really should scare managers and finance teams because they are partly responsible for the nightmare of skyrocketing energy costs.
The scale of the problem is huge. Based on observing a large number of data centers, I can conservatively estimate that 10% to 15% of servers are idle at any point in time. Idle servers draw approximately 50% of their maximum required power. What does this mean in financial terms? For a 400 W server, the total energy costs are $800 or more per year. The total cost of wasted energy is therefore substantial, and puts a high price tag on energy that generates zero output.
It is time to take ghost-busting measures, while simultaneously taking steps towards greater overall energy efficiency — and cost reduction — in the data center.
STEP 1: SHINE A LIGHT ON THE GHOSTS
You can’t eliminate waste that you don’t see. That means the first step in slashing up to 15% of your data center utility costs is all about gaining visibility. In my previous article, I explained how a holistic energy management solution can aggregate ongoing power and temperature data throughout your data center. It can give you a complete picture of the data center activity levels and how they correlate with, and impact, power consumption and temperature levels.
The resulting power and thermal maps are critical for gaining insights about the extremes in your data center. Besides identifying hot spots that can degrade reliability and lead to outages, an energy dashboard can put the spotlight on ghost servers. Whether viewed in real-time, or by extracting insights from logged data, ghost servers can be fully characterized over time and in relation to the various workloads and service levels in the data center.
STEP 2: MATCH ASSETS TO DEMAND
Superior energy management solutions collect real-time server inlet temperatures and power consumption data from rack servers and blade servers in addition to power-distribution units (PDUs) and uninterruptible power supplies (UPSs). Airflow is also monitored throughout the data center and at the computer-area air handler (CRAH) equipment.
Armed with this fine-grained energy picture and historical data, data center team members can identify ghost servers and accurately define the optimal number of compute and storage servers. This information may result in the reduction of the number of servers, but more typically helps IT make adjustments and better use existing assets, including:
• Some servers or storage devices can be powered down during periods of lower activity.
• Workloads can be shifted to offload servers that are operating close to their maximums, and driving up temperature in the data center as a result. Cooling can be reduced as hotspots are eliminated.
• Racks and rows can be reconfigured to more evenly distribute workloads and lower overall ambient temperature without taxing CRAH systems.
• Rack densities can be optimized for normal operation as well as operation during disaster recovery (DR). Maximizing rack densities allows for more energy-efficient cooling solutions.
STEP 3: ADJUST AMBIENT TEMPERATURE
After identifying ghost servers and making adjustments to align assets to demand, data center managers can use the same real-time energy and temperature data to make decisions about the possibility of increasing overall operating temperature in the data center. Proof of concept testing has shown that turning up the thermostat by just one degree — which lowers the demand on the cooling systems — can result in savings of 4% on the overall utility bill.
The industry buzz about high-temperature ambient (HTA) data centers includes headlines about mega-savings achieved by some of the biggest names in the technology industry. Google, Facebook, Yahoo!, and others are leading the heat wave, operating their data centers at temperatures up to 80°F.
Equipment vendors are specifying and warranting data center hardware for operation at temperatures up to 40°C (100°F).2 These higher temperatures are also reflected in current industry standards for data center efficiency, including those published by ASHRAE, Green Grid, Code of Conduct on Data Centres Energy Efficiency, and others.
Decisions about HTA operation must take into consideration your exact data center equipment, including legacy systems. Ultimately, it makes financial sense to follow these trendsetters. With accurate, real-time views of your data center temperature and energy patterns, IT and facilities can consider changes in ambient temperature, taking minimal risk.
STEP 4: AUTOMATE THRESHOLD MANAGEMENT
The just-mentioned “minimal risk” deserves elaboration. Whether evaluating or validating some energy-related change or just trying to avoid conditions that could lead to outages, data center management teams should broadly employ threshold alerts. As they relate to ghost servers, threshold alerts can also be used to flag under-utilized assets.
Holistic energy management solutions automate threshold setting and detection features. The simplification of this protection and visibility feature frees IT to try out changes and evaluate them over adequate periods of time including days/hours of peak data center demand.
STEP 5: ADJUST DISASTER RECOVERY RULES
The aftermath of a major outage or natural disaster could be when ghost servers are the biggest problem. Data center managers should include many of the previously mentioned steps for identifying and avoiding idle servers as part of their disaster recovery plans.
For example, during any outage, real-time monitoring should draw on previous logs of energy and temperature patterns to quickly identify any under-utilized server. Then IT can intelligently prioritize servers and resources, as well as reallocate workloads as needed to optimally use the available back-up power or the assets available in a collocation facility.
The same solutions that can help you avoid ghost servers also give you another DR tool: power capping. This feature lets you adjust power consumption as a function of server performance. Lower-priority applications can be shut down or configured to operate at lower performance levels, which in turn reduces the power draw. Power capping essntially increases the operating times for back-up power supplies by up to 25%, based on in-field measurements.
EXAMPLES OF ACHIEVABLE RESULTS
Having been broadly deployed over the past several years, there is a wealth of published results relating to energy management solutions. The best-in-class energy management solutions are helping data center managers achieve 20% to 40% reductions in energy waste.
Here are some examples of published results achieved by data center teams trying to root out ghost servers and similar data center energy inefficiencies:
• EMC carried out a proof of concept3 for its Atmos cloud storage offering, including evaluating 13 different use cases for power management solutions. The capabilities EMC validated included monitoring node history and using it to maximize number of nodes per rack for power and cooling efficiency. The EMC team also determined that power management technology will allow it to reduce idle power consumption from 50% down to 15% for server appliances.
• Korea Telecom carried out a proof of concept4 for an energy management solution with the goal of maximizing the number of servers within its data center constraints (space, power, and cooling). The POC proved that power consumption could be reduced by 15% with the introduction and monitoring of conservation policies. Furthermore, data center team members were able to identify under-utilized servers, and by putting those into a lower-power state, save an additional $2,000 per year per rack.
• BMW5 determined that it could use a combination of virtualization and energy management technologies to optimize CPU utilization for maximum energy efficiency. Tests showed that the achievement of power savings was approximately 18%.
Energy has become a significant operating cost, putting extreme pressure on both IT and facilities teams to increase efficiencies and demonstrate measureable results. Identifying waste — such as ghost servers that are left to consume the same amount of power as an active server — should be one of the first steps of an energy management initiative. The same technology that helps identify and trim waste also equips a data center team to further optimize power and cooling for all of the compute and storage servers in the facility.
Perhaps just as important as the monitoring and control capabilities are the historical logging and reporting features of energy management solutions. Having a knowledge base, extracted from real-time actual energy use, greatly improves energy-related decision making, data center planning, and justification of changes and investments related to energy. Look for a holistic energy management solution that lets you check off all of these boxes, and that includes features that elevate the transparency and accountability relating to energy behaviors in the data center.
1. Klaus, Jeff. “Myth Buster: Energy Management Models & the Quest for Power Efficiency.” Mission Critical Magazine. September/October, 2013.
3. EMC/Intel proof of concept. http://download-software.intel.com/sites/datacentermanager/atmospoweropt1_3c.pdf. 2010.
5. Joint Whitepaper. http://software.intel.com/sites/datacentermanager/node_manager_white_paper_bmw.pdf. 2009.