With yet another winter storm having recently inundated the Northeastern U.S., we’re reminded of the need for remote access to and management of our computer systems. Perhaps even more importantly, though, we’re reminded of what can happen when a data center site is threatened by a natural disaster.
It’s not just winter storms that pose a threat, though. We all remember the digital impact of Hurricane Sandy when it hit the New York area in 2012, and Hurricane Katrina likewise devastated countless IT installations along the Gulf Coast in 2005. Some might even remember the impact of Tropical Depression Allison on the Houston and Galveston areas in 2001. The list could go on and presents countless examples of the many emergencies data center professionals need to prepare for. There are several important things to consider when thinking about remote management during a natural disaster or other emergency.
What Can Be Remotely Managed?
It’s important to be realistic about what can and can’t be effectively managed remotely. When the impact of an emergency is likely only to affect systems — for example if a server crashes, a network goes offline or a site loses power — remote management is usually a viable response. But sometimes you just have to be there! If a data center or power generation source is threatened physically such as they would be if located in the basement of a building during a storm—as some were in New York City, New Orleans, and Houston — managing flood waters from a beach in Miami or a couch at home just isn’t going to cut it. If a building structure has been breached by a hurricane, tornado, earthquake, or other natural disaster, the only people who are going to make an impact are those on site.
Objectives of Remote Emergency Response
Responding to IT emergencies via remote connection typically has one of two likely objectives:
1. To ensure computer systems are safely powered down, thus ensuring data and system integrity when they’re brought back online
2. To keep those systems online throughout the duration of the emergency
An effective disaster recovery/business continuity plan should identify which systems fall into which categories, and appropriate management response capabilities should be developed based on those defined needs.
Remote Management Technologies
By and large there are two methods for remotely managing systems: in-band and out-of band.
In-band management uses a primary network to connect to a system and then manage it using the primary interface. Obviously, in-band management requires a primary network to be online.
Out-of-band management, however, uses private connections, which are either serial or Ethernet. Typically an out-of-band connection is used when an individual server is not operational, but with proper planning it can also be useful when entire network infrastructures are offline, whether due to loss of Internet connectivity or a power failure.
Preparing a Data Center for Remote Emergency Management
Setting up a data center to support remote emergency management requires three key achievements:
1. The first is the implementation of an out-of-band management system within a data center. Using only in-band management will severely limit the remote management that can be performed during an emergency.
2. The second is ensuring that the out-of-band management network has an independent power source. If the objective is merely safe system shutdown, a dedicated uninterruptible power supply (UPS) to provide sufficient backup power to the out-of-band network will be enough. If, however, the objective is to maintain operations, then the out-of-band network needs to have its own independent power generation source.
3. The third is a secure inbound connection to the out-of-band network that is independent of the site’s primary connection. This can be done with dial-up modems, ISDN connections, alternate wired ISP connections and even 4G wireless Internet. It might actually be beneficial to implement multiple connection methodologies; for example, a 4G wireless connection as the primary channel, and a dial-up connection as a backup channel if the wireless circuits become jammed or go offline.
In short, while no one wishes for an emergency to strike, recent disasters have shown that being unprepared for one is not a viable option. A reliable infrastructure combined with a dependable or redundant connection capability is needed to remotely manage systems from a safe location in the face of a catastrophe.