Headlines and stories providing anecdotal examples of data center downtime caused by various power quality (PQ)-related events abound. Even Justin Timberlake (see figure 1) tweeted in frustration after one well-known web hosting company went down, “… just went down completely. Gonna be a bit longer than 5 minutes.” Today, everyone recognizes the importance of accessibility to the information and web sites. The mission is to be up 7x24, providing anything less can make news.
With the high cost of energy it’s understandable that data center operators are trying to shave recurring expenses, such as energy. Incorporating alternative energy sources such as wind and solar make for good corporate citizens, but availability needs to remain the primary focus.
The bottom line is that energy savings are important, but don’t take your eye off the ball. Originally, data center operators focused on capacity and reliability, and then turned to energy reduction. The reasons to save energy are clear: energy-related expenses account for approximately 12 percent of overall operational costs and are the fastest-rising data center expense, according to Gartner.
To put this into perspective, the January 2010 issue of The AFCOM Communiqué revealed that “in the second half of 2008 as the U.S. economy entered a deep recession and companies were forced to find ways to reduce spending, IT organizations began to look seriously at energy efficiency in terms of cost savings, as well as environmental responsibility … now, on the cusp of this century’s second decade, the data center finds itself balancing efficiency and availability while computing demand and energy costs are increasing and IT budgets are contracting.”
A recent Data Center Users Group (DCUG) survey pointed out that the top five concerns of data center operators changed from focusing on availability to efficiency and back again to availability in just a few years.
OPPORTUNITIES AND CHALLENGES
The survey results suggest that both energy efficiency and availability must be considered as integral parts of data center operations and evolution. As data centers evolve, opportunities may exist outside the walls of the building to move beyond what’s generated by the utility and to become more environmentally friendly—such as distributed generation from alternative or renewable sources.
The Smart Grid represents another potential opportunity. The term Smart Grid, however, can mean different things in different applications but may include better exposure to (real-time) utility energy costs and putting the source of power closer to the consumer, i.e., alternative energy.
There are many challenges, including how to benefit from these new technologies and how to dovetail them with corporate objectives. From a top-down perspective, the key challenge is to maintain availability while adopting these new opportunities and technologies.
POWER MONITORING AND AVAILABILITY
PQ monitoring has been shown to improve availability beyond simple capacity. Data center power systems are comprised of complex electromechanical devices, such as:
- transfer switches
- UPS systems
These devices require maintenance and can fail. PQ monitoring proactively identifies problems and looming failures, or just simply provides recorded data to determine the failure origin, which results in reduced troubleshooting and overall outage time.
Today’s focus on energy efficiency is demonstrated by the commonly used metric PUE, which is a ratio that compares the total power entering a data center with the portion used by the IT equipment, indicating an overall efficiency improving as the quotient decreases towards one.
PUE is a very important metric, but its “Achilles heel” is that it doesn’t factor in availability, which has a direct correlation to PQ. A UPS failure discovered by a PQ monitoring system exemplifies some of the commonly encountered problems in this area.
Consolidated Edison, the local utility serving New York City and its suburbs, experienced a voltage sag (reduction) of about 3.3 seconds, resulting in damaged elevator controls for a downstream customer, a NYC data center for a major international bank.
However, there was a more serious problem looming with a UPS, which carried the load as expected during the sag. The UPS reported no alarms nor did it indicate any problem during the power event. The PQ monitoring system, however, detected a swell of about 10 percent on the critical supply bus coincident with the end of the sag (see figure 2). Such a condition can cause serious potential damage in downstream equipment. In this instance, the problem occurred on one of two identical and similarly loaded UPS systems from the same manufacturer. Thanks to the PQ monitoring system, the UPS service team repaired the condition and prevented a future problem.
COMMON PQ ISSUES
PQ events manifest in a variety of ways, but the most commonly encountered types include:
- Transients—caused by lightning, switching, and deterioration and resulting in equipment damage, breaker trips, lockups, and more.
- Sags/swells—caused by utilities, UPS problems, wiring, and breakers, and resulting in reboots, intermittent operation, and equipment damage.
- Harmonics—caused by computer power supplies and lightning, and resulting in overheating, overloaded wiring, and lower efficiency.
In response to PQ concerns, a well-designed data center will mitigate PQ issues and include redundancies for failures. Thus, it is incumbent on the operator to properly manage and maintain the infrastructure from startup to continue the mission.
PQ GREEN LIGHT
Energy and PQ monitoring requirements concern themselves with the separate, though intertwined, relationships of quantity and quality of supply. Energy issues, for example, include how much is used, when it is used, and where, with the end goal being energy management and/or reduction. Typically measured in seconds, high accuracy is required, especially if the monitoring is for revenue or billing. Energy monitoring can involve multiple utilities, including electricity, gas, and water.
The flip side of that “coin” is PQ, which typically involves compatibility of the power supply to the load, power system reliability, uptime, problem solving, and economic impact of failures. PQ measurements are typically acquired in milliseconds or microseconds. High accuracy is important but not required, although many of today’s PQ instruments do provide revenue-grade accuracy.
ENERGY AND PQ
Green Grid Level 1 energy-monitoring locations have long been recommended by industry experts and manufacturers for the proactive PQ monitoring of total facility power (utility service) in terms of better understanding the quality of the service from the utility and its effects on systems. Another recommended PQ monitoring location is the UPS output (IT equipment power). This is the lifeline of the business. It is vitally important to monitor the critical buses for possible UPS and other critical systems problems, realizing that not all IT problems are power related, yet power is usually the first to be blamed. An effective, facility-wide PQ program virtually eliminates such finger pointing (See figure 3 for recommendations developed in 2004). It’s no coincidence that the minimum monitoring recommended at that time coincide with Green Grid energy monitoring recommendations of today.
|Adding A PQ Green Light To PUE
A major paperless health-care provider employs a two-pronged, reactive and proactive approach to data center efficiency and availability. By using its PUE dashboard, the reactive approach trends and merges electrical energy data into its existing building management system (BMS), which includes monitoring all UPS systems and emergency back-up generators, in addition to monitoring the data center’s mechanical load.
The load profiling capability is provided along with a daily record of peak demand of data center loads. These necessary elements give facility engineers the ability to optimize the efficiency of data center operations.
The proactive approach incorporates a comprehensive PQ and energy monitoring strategy that employs energy-capable PQ instruments at critical points around the facility. The dual purpose of power quantity and quality provides energy data to the BMS for PUE and other parameters.
According to the facility’s chief engineer, “when we make a change or improvement, we have to back it up with data. The energy monitoring instruments give us the solid data we need to prove to management that the change we made to the system was effective, by showing the data before the improvement and the after-results that prove the change was warranted.”
He further noted, “in addition to PQ analysis, it’s a validation tool. If we get a report from the IT department about a problem, we can go back through our historical data and determine the root cause of the problem.”
Energy and PQ monitoring are complementary and work very well together to reduce energy costs while improving reliability and reducing maintenance costs. A comprehensive monitoring program makes good business sense, because the PQ instruments are also capable of monitoring energy. Installation costs for energy and PQ instruments are about the same but can be significant for a retrofit.
PQ instruments with energy capabilities can monitor power of IT equipment in Level 1 and Level 2 locations. For Level 3 and other monitoring locations, energy-only instruments are adequate. A web-enabled system that can acquire meter data automatically also enables remote monitoring capabilities, which can be extremely important (see figure 4).
• Level 1 and 2 monitoring tools. Instruments for measuring PQ, demand, and energy must provide the capability of capturing sags, swells, transients, and harmonics. The instruments themselves should offer IEC 61000-4-30 Class A compliance and provide high-accuracy measurement capability for W, VA, VAR, PF, kW, and kWh. Modbus TCP and Ethernet connectivity are required. Similarly, energy-only instruments must offer high accuracy and connectivity for Modbus TCP and Ethernet.
•Level 3 monitoring tools. Level 3 monitoring tools include branch circuit monitors and power strip monitors. The former measure the current of each breaker. Typically providing 2 to 5 percent accuracy, branch circuit monitors typically employ split-core current transformers (CT) for retrofits and solid-core CTs for new installations. Communications are accomplished via Modbus RTU or TCP to a remote server. Power strip monitors are also available but are less evolved and may not integrate into the larger power monitoring system.
At this time, there are no industry-standard metrics to assess PQ in data centers. Some industries do provide metrics, such as SEMI-47 for the semiconductor industry, but older metrics for data centers such as CBEMA and ITIC are inadequate because different types of equipment (servers, UPS, etc.), manufacturers and facility designs, provide different operational and tolerance specifications for voltage regulation, distortion, etc.
Meeting the challenge begins by determining what constitutes a PQ problem by virtue of its potential for susceptibility. The near-term solution is to “keep your eye on the ball” and provide for own individual assessment of PQ. The longer-term possibility would be to develop metrics for data center PQ, including defining the characteristics of a data center PQ problem and developing metrics for benchmarks and comparisons.
PUE and energy monitoring are essential to the efficient operation of today’s data center, however the PUE metric in its current form is not an indicator of PQ and its effects on availability. Based on proof that PQ monitoring can maintain and enhance availability, benchmarks are needed to evaluate data center quality of supply in an environment increasingly dependent on uptime availability.