We focus so much time and energy on the commissioning of mechanical and electrical systems in our data centers that we often overlook the quality and reliability of the interface points between the facility and the technology. In communications, we talk about the difficulty and expense of completing the “Last Mile”-moving data from the fiber backbone to our site, while neglecting the role of the rack/cabinet-mounted PDU or power strip as a communications point and its importance as the last mile of the reliability infrastructure.
In an “A” and “B” power distribution environment, we can take the position that we could run on the remaining power source if a PDU fails or needs to be replaced. Yet every time I tell an IT department that I am going to turn off half their power, chills run down their managers’ backs before they dispatch technicians to verify that both cords of each device are in fact plugged into separate power strips from separate sources.
Even after all has been verified, I’m told, “With only one power cord the server will work at a reduced speed,” and “We are not yet a 100 percent dual power cord shop.” The bottom-line is that the industry spent millions of dollars to create dual power cord environments that are never tested, and it seldom writes specifications that go beyond feature lists so we never know how well the application works. As a result, every thing we build relies on the operation of the PDU as the last point of connection.
There are several reasons why such dependence on rack-level PDUs is risky.
The first is the way they so often get specified and purchased. PDUs are seldom purchased as part of the design, construction, and commissioning process. Most data center builders see them as an accessory purchased and integrated as part of the rack (cabinet) purchase. The construction team is given the size and type of twist lock outlet to provide, and the IT team or rack provider supplies the power strip as part of the integrated rack system.
Second is the quality of manufacture. Almost all PDUs come with a UL label, but that means only that the units meet safety requirements for their UL-tested use. Some PDUs have fuses instead of PDU-mounted circuit breakers, which risks a single-phase outage if one fuse blows in a three-phase circuit. The quality of circuit breakers varies greatly: even some PDU-mounted circuit breakers trip when not fully loaded and others fail to trip when needed.
Circuit breakers positioned inside a rack like on/off switches that can be accidentally turned off add to the concern. The better quality breakers now include a plastic cover to prevent accidental shutdowns.
Intelligent features, surprisingly, can be another problem. Gone are the days of ordering up a 20-amp, single phase, ten-outlet power strip (PDU). Today PDUs are full-featured devices, complete with the ability to turn on/off individual outlets, monitor power at the full PDU or outlet level, as well as other check other inputs to monitor temperature, humidity and more. The more features the greater the risk.
The on/off function is generally reserved for installation in remote sites where personnel are not immediately available for a reboot of a server, but some PDUs come with this feature whether you want it or not. This creates the opportunity for someone to take control either accidentally or surreptitiously. This poses a high operating risk in a primary data center that lacks operating procedures because a function was not believed to be available.
Many companies today use the power monitoring function to check for overload conditions and to calculate PUE; however, studies have shown that there is a wide variation in PDU quality and accuracy for those that monitor power.
Several years ago Charles Hobart, manager of Business Development at VMC, a Seattle consulting firm, performed a detailed analysis for a client on intelligent power strips. Seven different companies provided a strip for testing. The set up included a calibrated Fluke meter to read current, voltage, etc., at the same time as the power strip. Hobart tested for accuracy and repeatability over a long series of power increases and decreases. “Some of the strips were over 18 percent in error per socket so if you depended on the readings to add another server that rack may go down,” Hobart said. On the other hand, one strip was accurate to the Fluke within 1.5 percent.
Bob Von Stein, owner of Revco Inc., a supplier of data center products, offers up some solid advice, “Any manufacturer worth anything should be able to provide you with an evaluation unit. Set it up and test it out. In most cases, it will do what the manufacturer says it will do. But, the next step is to physically take it apart! A top quality manufacturer shouldn’t care and should even encourage you to do so! Look at the internal construction and compare each of the brands you are considering. In many cases, you will see a difference. You do get what you pay for. If a comparable PDU is much less than another, it probably has to do with that quality of construction. That translates into reliability. The biggest mistake you can make is to assume that a PDU is a PDU.”
We need to think about how an intelligence circuit/display may fail and how its failure would affect the PDU when intelligence is added to the PDU. It may be as simple as asking: Can I replace the intelligence circuit and display without shutting down or replacing the entire PDU? Displays and intelligence circuits fail all the time, so to ask whether you want to risk having to shutdown a rack of servers to replace a PDU just because the monitoring functions failed?
Finally proprietary software that the vendors are hyping can backfire. The software supports useful functions like monitor, alarm, measure, trend, aggregate, and more. Further the software enables the use of SNMP, BACnet, or Modbus.
This all sounds great until you begin to implement the software and experience the shortcomings. It’s been said that “data is king,” but data are mostly worthless unless we aggregate the data into meaningful analysis and reports. Can we trend each data point, can we trend multiple user-defined groups of points, can we export data into other databases or spreadsheets for further analysis? What about measuring consumption for group billing and most importantly does the system you bought (are buying) have the capacity to compare year over year data both graphically and numerically? Anyone for a real-time PUE or carbon footprint?
For an industry that prides itself in reliability and information management, it appears we have a very weak link in the best practices chain. How will we strengthen that chain to make it as strong as the rest of our programs? Let’s hear from you.