This particular scenario also raises questions about the rate of temperature rise in failure mode and the survivability of high-density operations during a loss of cooling.
Site Description
Our scenario unfolds in a new data center with:- A pre-action sprinkler system zoned by area
- three-foot (ft) high raised floor
- 16-ft finished floor to a finished acoustical tile ceiling
- Standard CRAC units distributed throughout the room
The Event
Saturday morning: As a precaution, prior to load bank testing the installed PDUs, the electrical contractor shut the main sprinkler valves on the pre-action sprinkler system zone and electrically isolated the main fire pump. The load banks are set-up inside the data center and are connected to a 300-kVA PDUThe building controls contractor is working on the chiller controls remote from the data center space. The chillers are not on-line, but the CRACs are running.
The, electrician begins a full load test of a single PDU. Within minutes the data center begins heating up. The project manager goes to check about the cooling
The combination smoke/heat detector soon goes into alarm, releasing the pre-action solenoid. Immediately a 155 degree F sprinkler head releases, allowing water to flow.
The test stops, and all hands address the sprinkler system to minimize damage.
Analysis
My years of data center post mortems lead me to conclude that there is always a series of events that come together at a single point in time to cause failures. If any one of these events had not taken place when it did, then the failure scenario most likely would have been avoided.Testing is the deliberate and intentional effort to learn the limitations of a facility before it goes live. Testing identifies the shortcomings of design, the errors of installation, and highlights training needs for the operations staff so that they can be mitigated before the data center becomes operational.
In this scenario, testing achieved all of these criteria.
The testing crew had good intentions when it isolated the pre-action system, but it neglected to isolate the jockey fire pump. The controls and electrical testing contractors failed to foresee the lack of chiller capacity at that point in time, and no one predicted that load bank testing a single 300-kVA PDU in a 20,000 square foot room with 16-ft clear height and 3-ft high raised floor would be an issue.
Result
The +300° F horizontal discharge output of the two in-room temporary load banks connected to the 300-kVA PDU stratified at the 16-ft high ceiling, causing the heat function of the combination smoke/heat detectors to activate, followed immediately by the melting of the 165¡F sprinkler head resulting in a water flow onto the floor.Lessons Learned
- Use a fire system expert to isolate the fire system, not an electrical contractor.
- The design called for smoke detectors on the pre-action system, but the fire contractor furnished combination heat/smoke detectors, which were the latest "technologically advanced" detectors for the pre-action system. The contractor never programmed out the heat function from the pre- action alarm circuit.
- In a high-bay data center, heat will stratify at the ceiling when the CRACS are only drawing return air from 7-ft AFF.
Data center mechanical systems, like any man-made system, will eventually experience a failure. Who is:
- Taking a second look at the pre-action sprinkler systems, given the rapid rate of temperature rise in high-density data centers?
- Calculating the time it takes from HVAC failure to melting of a sprinkler head?
- Evaluating the sequences of operation and validating the application of the new advanced "improved" designs?
Is this problem lurking as a surprise to operators as data centers are populated and approach their design capacities? Let's hear from you.
Also, I would like to hear from the design community as to how you calculate the rate of temperature rise in a failure mode.