Minimizing Critical Facility Risk During COVID-19 Pandemic, Part 2
Questions and answers from Uptime Institute’s webinar
On March 31 Uptime Institute hosted the webinar COVID-19: Minimizing Critical Facility Risk. Much of the content was based on the Uptime Institute report COVID-19: Minimizing Critical Facility Risk. Below you’ll find the questions (Q) attendees submitted, organized by topic and answered (A) by Uptime Institute experts.
Q: What is your view on continuous, long shifts for four or seven days in the data center?
A: It is not best practice for operations to change to continuous shifts from four days to seven days, as fatigue and stress will increase the human risk factor that can cause abnormal incidents. Instead, we recommend assessing extending the shift time from 8 hours to 12 hours and limiting this to a maximum of two or three consecutive days. Any extended continuous shifts should include long, regular breaks each shift to avoid fatigue. There needs to be a careful balance between the increased risk of human fatigue and the mitigated risk of virus spread. Managers should also consider that the total hours worked per person does not increase, that overtime will not be over 10%, and that the shifts are arranged so that staff can rest adequately between shifts.
Q: What procedures should we follow to identify an infected staff member?
A: As described in our report COVID-19: Minimizing critical facility risk, we recommend contact tracing systems. Register the health information and location of your organization's personnel, suppliers' personnel, and other related personnel every day to monitor possible exposure to the virus and/or any symptoms (including those of the common cold). We recommend prescreening all scheduled visitors before they arrive on-site, including sending a questionnaire via email 48 hours prior to their visit. Require completion of the questionnaire before the appointment is confirmed. Verify that all answers remain unchanged upon arrival and institute temperature checks using noncontact thermometers before entry to the facility. For a confirmed COVID-19 case at the site, we recommend that cleaning personnel use bio- hazard suits, gloves, shoe coverings, etc., and that all personal protective equipment (PPE) is bagged and removed from the site once cleaning is complete. With or without a confirmed case at the site, ensure the availability of PPE, including masks, gloves, and hazardous materials or hazmat suits. Depending on the appropriate medical or management advice, workers should use masks during shift turnover. Training pairs (e.g., senior engineer and trainee) must wear masks at all times. For further information, please refer to our report COVID-19: Minimizing critical facility risk.
Q: How feasible is it to move families to the data center?
A: Although housing staff on-site should be considered only as a last resort, regions could go into lockdown mid-shift, so you may need to prepare for that eventuality. There are disaster recovery plans that include providing accommodation for several family members for up to two weeks to avoid traveling to and from the data center. While the data center is perceived as a controlled-access space, it is not a safe space. Therefore, any organization considering this option should also consider offering a specialized training program for family members that includes awareness of the hazards and the associated risks, emergency evacuation procedures, etc.
Q: Is it always recommended to keep personnel 24/7 for Tier III and Tier IV data centers?
A: Yes, it is a required criteria of the Uptime Institute Tier Certification of Operational Sustainability to have a minimum of one 24-hour, 7-day-a-week qualified staff presence (full- time employee) for Tier III data centers per shift and a minimum of two 24-hour, 7-day-a-week staff presence (full-time employee) per shift for Tier IV data centers.
Q: We must not forget the following considerations for staff that may need to stay at the data center for 24 hours or more: the need to prepare food; a supply of canned food for more than 40 days, as well as alkaline water and the ability to purify it by reverse osmosis in case of water contamination; and cardiopulmonary resuscitation equipment for emergencies.
A: Correct, all these initiatives are proactive and preventative. Uptime Institute's COVID-19: Minimizing critical facility risk report provides additional information related to what measures data center management should consider for the health and safety of staff and the protection of the site.
Q: Do you recommend interviewing all internal staff to determine their personal situation, and whether this should be done by a psychologist, particularly if staff are in the data center for a long time?
A: Organizations should maintain open and continuous communication with staff, customers, and relevant third parties on a daily basis or even twice daily. Briefings may be appropriate as the conditions change. We also recommend sharing news updates and links to public resources to keep staff informed of the current status of the pandemic and the best practices for maintaining a safe and healthy work environment. As appropriate for each case, emotional support should be provided to reduce stress. Special attention should be given to any changes to continuous, long shifts that could increase the risk of human error, which may cause abnormal incidents.
Q: If an average of 50 people per day enter the data center, how often is filter change recommended? What parameters do I use to make that change?
A: Under normal operating scenarios, filter changes are typically triggered by an increased pressure differential measured across the filter. As the filter clogs with debris and particulate, pressure drop will increase. As far as operations during COVID-19, there does not appear to be any reasons for a change to the outlook on filter changes, including what should trigger a filter change, based on information currently available on how the virus spreads (although information seems to be changing over time).
Q: What types of access controls or filters are recommended to implement? Are there air conditioning filters on the market that limit the circulation of viruses in the data center?
A: Currently, it is not believed that filters will play a large part in mitigating the spread of the virus in data centers. Some research suggests that high-end filters (for example, high efficiency particulate air filters) are capable of filtering particles the size of COVID-19. However, based on current information on how the virus spreads, it is not generally believed that it spreads in a true airborne manner to the point where it gets in ventilation systems; it spreads via proximity to infected people sneezing, coughing, etc. Data centers are unique in that they have multiple air changes per minute, which is different from other facility types. It is important to note that while filtering could theoretically reduce spread, the National Air Filtration Association does not believe this will happen from a practical perspective.
Q: My concern is that, in a closed-loop environment, if COVID-19 enters the data center, the virus will live and could infect more people.
A: That is an accurate statement. That is also why it is important to take any reasonable steps possible to keep it out of the data center via strong site-access control requirements and checks. It is also why many data center operators are implementing regular disinfecting so that if it is in the data center, spread is mitigated. Uptime Institute is aware that many operators have found specialized data center cleaning companies that are capable of disinfecting sites in accordance with guidelines for this pandemic from the World Health Organization (WHO) and/or the U.S. Centers for Disease Control and Prevention (CDC).
Q: How long can the airborne virus particle last in a data center because it is cold air?
A: According to the WHO: "It is not certain how long the virus that causes COVID-19 survives on surfaces, but it seems to behave like other coronaviruses. Studies suggest that coronaviruses (including preliminary information on the COVID-19 virus) may persist on surfaces for a few hours or up to several days. This may vary under different conditions (e.g., type of surface, temperature, or humidity of the environment)." We believe that this is an area that is still being studied. There does not appear to be any consensus, other than there are a number of factors, including temperature and humidity, that can impact this.
Q: Taking into consideration that the virus lasts for an incubation period, would it be possible to bring contamination into the data center?
A: Based on the information presented by the WHO and CDC, it is likely that the virus can be introduced and that there can be contamination in the data center. The most likely vector for transmission is infected individuals who do not know they are infected. Please refer to the WHO, the CDC, and/or your local authority for more information.
Q: Is there any type of clothing, masks, or gloves that are recommended for access to the data center by customers or suppliers so as not to expose our staff? Is it more feasible for staff to carry this type of PPE or to demand it from the customers or suppliers?
A: The CDC is recommending the N95 mask. Other masks do not seal tightly around the nose and mouth to provide proper protection. To be fully effective it must be fitted properly. Specialists receive training annually on how to properly fit these respirators around the nose, cheeks, and chin, ensuring that wearers don't breathe around the edges of the respirator. When you do that, it turns out that the work of breathing, since you're going through a very thick material, is harder. You have to work to breathe in and out. All personnel accessing the data center should be wearing PPE in accordance with the current policies related to the COVID-19 pandemic and future similar events. It is important to note that we are seeing companies implement their pandemic plans, which to a large extent, includes limiting site access to customers and employees. The pandemic plans vary between companies, but we are hearing of restrictions being implemented to include the use of masks, gloves, etc., primarily to follow CDC guidelines. Our suggestion is to follow CDC guidelines, as well as follow your approved pandemic plan. Please note that sanitizing methods may be a better thing to focus on than the use of masks. Also, while masks would provide an additional layer of protection, until production of masks ramps up, Uptime Institute is now somewhat cautious about recommending data center owners and operators to stock up on masks. The “good” masks are in short supply and should be allocated to health care professionals until there is sufficient supply to go around.
Q: If we decide to sanitize [our data center], the fire system detectors can go off. What do you recommend?
A: It is common during various housekeeping operations to put the fire system into bypass. This is especially important if there is a very early smoke detection apparatus (VESDA) system present, which can be triggered by disturbances of even very small particulate (they are specifically designed to be highly sensitive). Our recommendation is to put the fire system in bypass while maintaining compliance with local jurisdictional requirements. This may require fire watch or similar measures be taken while the system is in bypass.
Q: Insecurity is likely to increase in different regions, do you recommend increasing security?
A: Certainly, in facilities operating in severely affected areas, the level of security risk could be affected. In these areas, management must adopt enhanced security policies, including prescreening all scheduled visitors before arrival on-site; prohibiting all unscheduled visitors; and if possible and applicable, creating a separate, secure entrance for all parties involved in essential on-site construction projects and establishing a policy that they (or any other visitors) are not allowed to interact with duty operations personnel.
Q: One of our clients does not want to do maintenance to avoid entering the data center. What do you recommend for this? Is it necessary to defer preventative maintenance on data center components?
A: Maintenance activities should be prioritized. At the very least, try to perform the most critical activities. If unable to do this, try to rotate hours as much as possible between redundant components and also contact the manufacturers of components/equipment to better identify the impacts of not performing maintenance on specific equipment. Deferred maintenance brings higher risk; in some equipment, this risk is more serious than in others, so maintenance activities should be prioritized in order of criticality. For more information, please see our report COVID-19: Minimizing critical facility risk.
Q: Will the learning from this pandemic be reflected in adjustments in the certification levels of each of the Tiers of the Uptime Institute? Will the Uptime Institute's standard for operations be updated due to COVID-19 to incorporate lessons learned from this situation? How relevant will data center infrastructure management (DCIM) systems become from this pandemic? Will there be an emphasis on DCIM at the Uptime Institute certification levels?
A: Yes, Uptime Institute is currently evaluating potential adjustments in the criteria of the Uptime Institute Tier Certification of Operational Sustainability to take in consideration the pandemic and potential endemic issues that can affect the normal operation and sustainability of the data center. If the data center has implemented a DCIM system and building management system (BM), during a pandemic or other similar emergency events, these systems should be used to continually monitor, measure, and manage both IT and supporting infrastructure equipment such as power and cooling systems. There should be an emphasis on all virtual private network (VPN) connections, which should be tested to ensure reliable access for remote data center monitoring.
Q: Will future operational [sustainability] or management and operations awards contemplate additional procedures associated with pandemic risks?
A: Yes, Uptime Institute is currently evaluating and planning modifications of the Uptime Institute Tier Certification of Operational Sustainability, which will result in a change in the evaluation of data centers' ability to mitigate various risks, including pandemics.
Q: What would be the "Tier IV measures" in a data center regarding COVID-19?
A: Uptime Institute Tier IV is a reference largely to data center topology design and installation. COVID-19 is mostly impacting data center operations. Therefore, COVID-19 would not impact the Tier IV compliance of a facility.
Q: Once this pandemic is overcome, it should accelerate the transfer of data center owners' IT platforms to large data centers or to the cloud. What is your vision with regard to this issue?
A: There are several dynamics at play here, and for this reason, it is premature to give any definitive guidance until the situation clarifies. But some observations:
- Many enterprises are likely to conclude that they want to reduce risk and complexity in the future, and they will not welcome the extra costs and processes associated with reducing the impact of future pandemics. For many, the obvious solution will be to go to the cloud or colocation companies. But the former, while strategic for many, will be most disruptive, perhaps more expensive, and may make the risks less visible.
- Our research already shows that the biggest single impact of the lockdown has been to delay data center and IT projects. This is likely to slow down any major cloud/colocation moves, as a backlog builds and new priorities come into play. Overall, there is likely to be bias toward strengthening the status quo.
- As we move out of the pandemic, many enterprises will have cost-reduction programs in place resulting from loss of business. Cloud has many advantages, but few large businesses have found it to be cheaper, especially where data centers are already depreciated. And almost all find that even where the costs are not higher, there are temporary transition costs. Of course, every application, every service, and every company is different. Although it is speculative, we think it likely that using colocation will prove a less-disruptive and more cost- effective path than full-on cloud transformation. While the long-term trend toward cloud will continue, there may be more pressure on cloud operators to take active steps to attract enterprise workloads.
Q: What do you see as the future focus, having now had a pandemic as a precedent?
A: In situations like this, data centers face particular challenges due to the unavailability of key personnel to be in their roles due to illness or quarantine. We recommend that organizations develop a specific pandemic preparedness plan similar to civic emergencies that focuses on performance, efficiency, and reliability that include contingency plans that can be adapted to the challenges of the current pandemic or the potential of recurrent endemic events. Each organization's response will vary based on individual site environments and local government/mandatory restrictions. Plans should consider situations in which staff may be unable to access or leave the site on short notice. Please refer to our report COVID-19: Minimizing critical facility risk, which addresses this topic in detail.
Q: What time projection does Uptime Institute have for the COVID-19 crisis?
A: Unfortunately, there is insufficient information at this point in time to answer questions regarding the duration of the COVID-19 crisis.
Q: How can we minimize the issue of network saturation because we are all working from home?
A: We recommend that all remote workers have established security policies set up by their IT departments, and that IT departments explore potential bottlenecks and recommend mitigation efforts. Many employees working from a home office will use their internet service provider (ISP) to access the cloud and office local area network (LAN) over VPN, where the read/write profile is totally different compared with, say, a Netflix streaming movie (which is practically all downloading and has a “read” profile). Adding consumption from others in the home (such as family members also working or doing remote schooling), the “read” function is increased and often becomes the great villain in bandwidth consumption. This can cause bandwidth limits being reached, leading to packet loss or time outs, delivering a slow internet experience. With regard to VPNs, as part of their cybersecurity policies, many organizations and governments use strict access policies to control LAN users when working from the office. Remote workers use VPNs to build a tunnel in their ISP, linked with their office's secure access; however, this model is rigid and wasn't developed to accommodate the number of employees currently working remotely. This can lead to additional bottlenecks, giving the same slow internet experience as from home. Occasionally, connected to the LAN, a remote worker's internet access will pass through their company´s firewall to locate a cloud-based service. This can cause a serious degradation of service because the remote worker's cloud services traffic is compounded by activity such as their video-conferencing traffic and by their family´s consumption.
Q: Any recommendations for the security of data center information? The question relates to the importance of remote monitoring of mission critical systems and whether this would be done through cloud-hosted applications.
A: If the data center has implemented remote monitoring and BMS, these systems should be used during a pandemic or other similar emergency events to monitor, measure, and manage both IT equipment and supporting infrastructure such as power and cooling systems. There should be emphasis on all VPN connections to ensure they are tested and enable reliable access for remote data center monitoring.
Q: In the case of suppliers working at reduced capacity, do you recommend purchasing spare parts stock for operating equipment, taking into account the budget constraints/budget recommendations of prioritization?
A: The potential for long-term disruption to the supply chain for critical spares and consumables should be considered. If service level agreements include spare-parts supplies, communication should be established to ensure key equipment parts are available and/or to establish additional time for arrival in case of failure or emergency.
More resources relating to maintaining data center operations and business continuity during this pandemic can be found here.