There is no such thing as plug and play when critical infrastructure is deployed or existing systems are overhauled to support a company’s changing business mission. Reliability is not guaranteed simply by installing new equipment, or even building an entirely new data center. An aggressive and rigorous design, failure mode analysis, testing/commissioning process, and operations plan proportional to the facility criticality level are a necessity and not an option.
Of particular importance is the actual commissioning process and developing a detailed operations plan. More budget dollars should be allocated to testing/commissioning, documentation, education/ training, and operations/maintenance because more than 50 percent of data center downtime can be traced to human error. Budget analysis adjustments are certainly warranted.
Commissioning is a systematic process of ensuring, through documented verification, that all building systems perform according to the design intent and to the future owner’s operational needs. The goal is to provide the owner with a safe and reliable installation. A commissioning agent who serves as the owner’s representative usually manages the commissioning process. The commissioning agent’s role is to facilitate a highly interactive process of verifying that the project is installed correctly and operating as designed. This is achieved through coordination with the owner, design team, construction team, equipment vendors, and third-party commissioning provider during the various phases of the project. ASHRAE’s Commissioning Guideline 0-2005 is a recognized model and a good resource that explains this process in detail and can be applied to critical systems.
Prior to installation at the site, all equipment should undergo factory acceptance testing that is witnessed by an independent test engineer familiar with the equipment and the testing procedures. However, relying on the factory acceptance test is not sufficient. Once the equipment is delivered, set in place, wired, and functional testing completed, integrated system testing begins. The integrated system test verifies and certifies that all components work together as a fully integrated system. This is the time to resolve all potential equipment problems. There is no “one size fits all” formula.
Before a new data center or renovation within an existing building goes on-line, it is crucial to ensure that the systems are burned-in and failure scenarios are tested - no matter the schedule, milestones, and pressures. You won’t have a chance to do this phase over, so get it right the first time. A tremendous amount of coordination is required to fine-tune and calibrate each component. For example, critical circuit breakers must be tested and calibrated prior to exposing them to any critical electrical load. After all tests are complete, results must be compiled for all equipment and the certified test reports prepared, establishing a benchmark for all future testing.
Scheduling time to educate staff during systems integration testing is not considered part of the commissioning process but is extremely important in order to reduce human error.
This activity can be considered part of the transitions-to-operations process. Hands-on training is invaluable because it improves situational awareness and operator confidence, which in turn reduces human error. The training can also break through misplaced confidence. Sometimes we deem ourselves ready for a task, but we are really not. Handing off the new infrastructure or facility to fully trained and prepared operations teams improves success and uptime throughout the facility lifecycle. When you couple the right design with the right operations plan (training programs, documentation/MOPs, and preventative maintenance) the entire organization will be much better prepared to manage through critical events and the unexpected.
If proper training and preparation isn’t done during commissioning stage, building engineers will not become familiar with various process and procedures. Learning on the job increases operational risk. Knowing this up front, there is absolutely no reason not to commit the necessary budget for training, technical maintenance programs, accurate documentation/storage, and finally some type of credible certification that is continually revisited. If done correctly, potential mishaps and near misses will be avoided and the reduced risk will be like an annuity paying “Reliability/Availability” dividends.
Education and training offered through Mission Critical and Power Management Concepts can assist with this effort ( http://missioncritical.powermanage.com/).