Power & Management Best Practices For Enterprise Data Center Design
Meet your power and management goals throughout the life of the data center by adhering to best practices during the design phase.
The data center is one of the most dynamic and critical operations in any business. Complexity and criticality have only increased in recent years as data centers experienced steady growth in capacity and density, straining resources and increasing the consequences of poor performance.The Ponemon Institute’s “2013 Cost of Data Center Outages Study,” sponsored by Emerson Network Power, revealed that the average cost for any type of data center outage is more than $690k. The average cost of a partial data center shutdown is more than $350k and of a full shutdown is staggering — more than $900K. The calculations include both direct and indirect costs, including but not limited to: lost productivity, equipment damage, cost to detect and remediate systems and core business processes, and lost confidence among key stakeholders.
Because the cost of downtime is so high, availability of IT systems is generally the most important metric on which data centers are evaluated. However, data centers today must also operate efficiently — in terms of both energy and management resources —and be flexible enough to quickly and cost-effectively adapt to changes in business strategy and computing demand. The foundation to achieve these objectives is strong data center design. The following best practices represent proven approaches to employing power and management technologies in the quest to improve overall data center performance.
Best Practice 1: Select a power system to optimize the maximum protection or maximum efficiency environment.
There are many options to consider in the area of power system design that affect efficiency, availability, and scalability. In most cases, availability and scalability are the primary considerations. The data center is directly dependent on the critical power system, and electrical disturbances can have disastrous consequences in the form of increased downtime. In addition, a poorly designed system can limit expansion. Relative to other infrastructure systems, the power system consumes significantly less energy, and efficiency can be enhanced through new UPS economization modes of operation.
Data center professionals have long recognized that while every data center aspires to 100% availability, not every business is positioned to make the investments required to achieve that goal. The Uptime Institute defined four tiers of data center availability (which encompass the entire data center infrastructure of power and cooling) to help guide decisions in this area (Figure 1). Factors to consider related specifically to AC power include UPS design, module-level redundancy, and power distribution design.
MAXIMUM PROTECTION UPS DESIGN
For critical applications where maximizing availability is their top concern, a robust UPS design is required. The key to protecting your data center’s availability starts with ensuring you have the highest quality power at all times. As you explore UPS options that provide maximum protection, you should seek out a UPS solution that offers the following benefits and characteristics.
- Built-in electrical isolation for proper safety and performance.
- Handles a stack-up of multiple adverse conditions at once without compromising your connected IT load.
- Maintains online operation without transfer to bypass during DC ground fault conditions.
- Fuse-less continuous duty bypass design for fault management and to allow fault coordination of the distribution system.
There are many options to consider in deploying the right maximum protection configuration for your environment. Transformer-based UPS systems are highly robust and excel at providing the highest capacities and levels of availability while simplifying external and internal voltage transformation, as well as fault management.
For high-power enterprise data centers and other critical applications, a state-of-the-art transformer-based UPS still provides an edge in availability. Transformers within the UPS provide integrated fault management and galvanic isolation as well as greater compatibility with critical power distribution system requirements that should be considered when designing a high-availability UPS system.
When additional capacity is needed, additional modules are added to the system in either centralized or distributed bypass architecture (Figure 2). In a centralized bypass architecture, there is a single system level bypass and no module level bypass. In a distributed bypass architecture, each UPS has its own bypass that must act in concert to take the system to bypass. Of the two, centralized bypass has higher MTBF, so is usually the configuration of choice in a maximum protection system.
MAXIMUM EFFICIENCY UPS DESIGN
If your organization is focused on data center efficiency, then energy costs, power density, and future growth should be the top concerns on your mind when it comes to UPS solutions. A UPS solution designed for high efficiency can reduce both operating and capital expenditures, helping drive a low total cost of ownership. As you search for a UPS solution that can maximize efficiency, look for an offering that provides the following benefits and characteristics.
- Optimized design for high efficiency to minimize operating costs
- Small footprint with high power density
- Lower costs for installation and deployment
- Performance tuned to today’s IT loads
- Scalable architecture to allow cost-effective growth
Distributed bypass architecture will typically have a smaller footprint, so it is usually the configuration of choice for a maximum efficiency system.
There are UPS operating modes to consider that can improve efficiency further. One such option is the use of active eco-mode, which accomplishes improved efficiency by powering the bulk of the critical load through the bypass path. When power problems are detected, the UPS automatically switches back to double-conversion mode.
In double-conversion UPS systems, the rectifier and inverter are designed to run continuously with the rectifier directly powering the inverter (Figure 3). The incoming AC is rectified to DC, which is then converted back to AC by the UPS inverter, resulting in a low-distortion, regulated stable AC output voltage waveform. This process can be up to 96% efficient.
The active eco-mode approach pushes efficiency above 98% in some cases. It keeps the inverter and rectifier in an active state, which means the inverter is kept in an active state and ready to accept the load immediately. As a result, the transfer to the inverter can be accomplished almost seamlessly. When the UPS senses bypass power quality falling outside accepted standards, the bypass opens and transfers power immediately back to the inverter until bypass anomalies are corrected. Once bypass power anomalies end, the critical load can be automatically returned to active inverter eco-mode.
Not all UPS economization modes are created equal. Due to technology limitations some UPS systems have to turn off the inverter before turning on the bypass or turn off the bypass before turning on the inverter. This is commonly referred to as an interrupted transfer. In active eco-mode, the UPS controls the inverter to have zero current passing through it. This zero current mode takes only one control instruction cycle for the inverter to produce current for voltage regulation mode, which is the normal mode of operation. It is due to this technique that going into or suspending active eco-mode operation can be accomplished without any interruption to the load.
Active eco-mode operation is similar to the operation used for transfer. During transfer of the load from bypass to inverter, the inverter matches the bypass voltage and phase angle. The inverter then increases its frequency to lead the bypass, effectively moving the load to the inverter. When the static switch is turned off, the load is isolated from the bypass source and the transfer is completed. Active eco-mode operation also connects both the bypass and the inverter simultaneously to the load. The difference is that the inverter matches the bypass voltage and frequency in a manner that allows the inverter to remain connected to the load.
Another newer function enabled by UPS controls is intelligent paralleling, which improves the efficiency of redundant UPS systems by deactivating UPS modules that are not required to support the load and taking advantage of the inherent efficiency improvement available at higher loads.
UPS SYSTEM CONFIGURATIONS
A variety of UPS system configurations are available to achieve the higher levels of availability defined in the Uptime Institute classification of data center tiers.
Tier IV data centers generally use a minimum of two N + 1 systems that support a dual-bus architecture, to eliminate single points of failure across the entire power distribution system. This approach includes two or more independent UPS systems, each capable of carrying the entire load with N capacity after any single failure within the electrical infrastructure. Each system provides power to its own independent distribution network, allowing 100% concurrent maintenance and bringing power system redundancy to the IT equipment as close to the input terminals as possible. This approach achieves the highest availability but may compromise UPS efficiency at low loads and is more complex to scale than other configurations.
An emerging UPS system configuration is the reserve bus architecture, sometimes referred to as a “catcher” system (Figure 4). In the reserve bus system, there are multiple single busses that can be loaded to near full load, while a reserve bus is present at no load to “catch” any of the buses that come off line. This configuration maintains two protected busses while pushing utilization rates up.
Best Practice 2. Design for flexibility using scalable architectures.
One of the most important challenges that must be addressed in any data center design project is configuring systems to meet current requirements, while ensuring the ability to adapt to future demands. In the past, this was accomplished by oversizing infrastructure systems and letting the data center grow into its infrastructure over time. This approach is no longer ideal because it is inefficient in terms of capital and energy costs. The new generation of infrastructure systems is designed for greater scalability, enabling systems to be right-sized during the design phase without risk.
Some UPS systems now enable modularity within the UPS core module itself (vertical) across modules (horizontal) and across systems (orthogonal). Building on these highly scalable designs allows a system to scale from individual 200 to 1,200 kW modules to a multi-module system capable of supporting multi-MW. The power distribution system also plays a significant role in scalability. Legacy power distribution used an approach in which the UPS fed a required number of power distribution units (PDUs), which then distributed power directly to equipment in the rack. This was adequate when the number of racks and servers was relatively low, but today, with the number of devices that must be supported, breaker space would be expended long before system capacity is reached.
Two-stage power distribution creates the scalability and flexibility required (Figure 5). In this approach, distribution is compartmentalized between the UPS and the server to enable greater flexibility and scalability. The first stage of the two-stage system provides mid-level distribution. The mid-level distribution unit includes most of the components that exist in a traditional PDU, but with an optimized mix of circuit and branch-level distribution breakers. It typically receives 480V or 600V power from the UPS, but instead of doing direct load-level distribution, it feeds floor-mounted load-level distribution units. The floor-mounted remote panels provide the flexibility to add plug-in output breakers of different ratings as needed.
Rack-level flexibility should also be considered. Racks should be able to quickly adapt to changing equipment requirements and increasing densities. Rack PDUs increase power distribution flexibility within the rack and can also enable improved control by providing continuous measurement of volts, amps, and watts being delivered through each receptacle. This provides greater visibility into increased power utilization driven by virtualization and consolidation. It can also be used for charge backs, to identify unused rack equipment drawing power, and to help quantify data center efficiency.
Alternately, a busway can be used to support distribution to the rack. The busway runs across the top of the row or below the raised floor. When run above the rack, the busway gets power distribution cabling out from under the raised floor, eliminating obstacles to cold air distribution. The busway provides the flexibility to add or modify rack layouts and change receptacle requirements without risking power system downtime. While still relatively new to the data center, busway distribution has proven to be an effective option that makes it easy to reconfigure and add power for new equipment.
Best Practice 3: Enable data center infrastructure management and monitoring to improve capacity, efficiency, and availability.
Data center managers have sometimes been flying blind, lacking visibility into the system performance required to optimize efficiency, capacity, and availability. Availability monitoring and control has historically been used by leading organizations, but managing the holistic operations of IT and facilities has lagged. This is changing as new data center management platforms emerge that bring together operating data from IT, power, and cooling systems to provide unparalleled real time visibility into operations.
DCIM enables the data center manager to see what is going on in real time, make the right decisions on what to do and when to do it, and then validate performance (Figure 6). This see-decide-act framework allows capacity to be managed, IT inventory to be tracked, and changes planned. It provides the ability to analyze and calculate energy usage and optimize cooling and power equipment operation.
The foundation for data center infrastructure management requires establishing an instrumentation platform to enable monitoring and control of physical assets (Figure 7). Power and cooling systems should have integrated instrumentation, and these systems can be supplemented with sensors and controls to enable a centralized and comprehensive view of infrastructure systems.
At the UPS level, monitoring provides continuous visibility into system status, capacity, voltages, battery status, and service events. Power monitoring should also be deployed at the branch circuit, power distribution unit, and within the rack. Dedicated battery monitoring is particularly critical to preventing outages. According to Emerson Network Power’s Liebert Services business, battery failure is the number one cause of UPS system dropped loads. A dedicated battery monitoring system that continuously tracks internal resistance within each battery provides the ability to predict and report batteries approaching end-of-life, to enable proactive replacement prior to failure.
Installing a network of temperature sensors across the data center can be a valuable supplement to the supply and return air temperature data supplied by cooling units. By sensing temperatures at multiple locations, the airflow and cooling capacity can be precisely controlled, resulting in more efficient operation.
Leak detection should also be considered as part of a comprehensive data center monitoring program. Using strategically located sensors, these systems provide early warning of potentially disastrous leaks across the data center from glycol pipes, humidification pipes, condensate pumps and drains, overhead piping, and unit and ceiling drip pans.
Communication with a management system or with other devices is provided through interfaces that deliver Ethernet connectivity and SMNP and telnet communications, as well as integration with building management systems through Modbus and BACnet. When infrastructure data is consolidated into a central management platform, real-time operating data for systems across the data center can drive improvements in data center performance:
• Improve availability: The ability to receive immediate notification of a failure, or an event that could ultimately lead to a failure, allows faster, more effective response to system problems.
• Increase efficiency. Monitoring power at the facility, row, rack, and device level provides the ability to more efficiently load power supplies and dynamically manage cooling.
• Manage capacity. Effective demand forecasting and capacity planning has become critical to effective data center management.
DCIM technologies are evolving rapidly. Next-generation systems will begin to provide a true unified view of data center operations that integrates data from IT and infrastructure systems. As this is accomplished, a true holistic data center can be achieved.
Best Practice 4: Utilize local design and service expertise to extend equipment life, reduce costs, and address your data center’s unique challenges.
While best practices in optimizing availability, efficiency, and capacity have emerged, there are significant differences in how these practices should be applied based on specific site conditions, budgets, and business requirements. A data center specialist can be instrumental in helping apply best practices and technologies in the way that makes the most sense for your business, and should be consulted on all new builds and major expansions/upgrades.
For established facilities, preventive maintenance has proven to increase system reliability while data center assessments can help identify vulnerabilities and inefficiencies resulting from constant change within the data center.
The last 10 years have been tumultuous within the data center industry. Facilities are expected to deliver more computing capacity while increasing efficiency, eliminating downtime, and adapting to constant change. Infrastructure technologies evolved throughout this period as they adapted to higher density equipment and the need for greater efficiency and control.
The rapid pace of change caused many data center managers to take a wait-and-see attitude to new technologies and practices. That was a wise strategy several years ago, but today those technologies have matured and the need for improved data center performance can no longer be ignored. Proper deployment of the practices discussed can have immediate TCO improvements — from availability to capital benefits, to amazing energy efficiency gains to ease of computing adaptations.
In the power system, high efficiency options work within proven system configurations to reduce operating costs while maintaining availability. Power distribution technologies provide increased flexibility to accommodate new equipment, while delivering the visibility into power consumption required to measure efficiency.
Most importantly, a new generation of infrastructure management technologies is emerging that bridges the gap between facilities and IT systems, and provides centralized control of the data center.
Working with data center design and service professionals to implement these best practices, and modify them based on changing conditions in the data center, creates the foundation for a data center in which availability, efficiency, and capacity can all be optimized in ways that simply weren’t possible five years ago.
For additional best practices representing proven approaches to employing thermal management technologies to improve data center performance, please check out this Emerson Network Power post on the Mission Critical blog.