At the heart of every modern building, there is a building management system (BMS) that controls the HVAC, lighting and security, and fire suppression systems. Building managers rely on BMS consoles to streamline and optimize building operations and ensure the environment is comfortable and safe for tenants.

A BMS needs to operate reliably whether it is in an on-premises data center or in a public cloud environment, such as AWS EC2, Azure, or Google Cloud Platform (GCP). There are several approaches to protecting a BMS from downtime and disasters — using fault-tolerant (FT), high-availability (HA) solutions, and disaster recovery (DR) solutions — but determining which approach is best depends upon multiple considerations.

Understanding fault tolerance, high availability, and disaster recovery

Let’s start with a fundamental consideration: What is the minimum level of system availability required? It’s important to match the level of protection with the application criticality. In other words, don't overspend on unnecessary protection. There is a misconception that all BMS systems require FT levels of protection when, really, an HA solution may deliver a perfect level of protection for a fraction of the cost.

FT vendors will guarantee the applications they protect will be available 99.999% of the time. Known as “five nines” of availability, this equates to ensuring that the application has no more than a total of five minutes and 15 seconds of downtime in a year. That’s an impressive availability figure, and, for some applications, it may make sense. For many critical applications, though, it really is overkill. A far less expensive HA solution, in contrast, promises 99.99% availability. “Four nines” of availability guarantee that the BMS application will be available for all but a maximum of 52 minutes a year.

DR solutions are a related but separate approach to operational continuity. A DR solution is designed to ensure that operational data and infrastructure can survive a literal disaster — such as when the hurricanes hit Puerto Rico and virtually all the infrastructure on the island was compromised. A DR solution would replicate the BMS in a geographically distant region that is unlikely to be affected by whatever disaster takes down the primary BMS infrastructure. With a DR solution, facility managers could run the BMS from the remote region or restore it when the local infrastructure is online again. But DR solutions are not designed for rapid response. It might take an hour or more to bring a DR-based BMS solution online, and, even then, it may not have some of the data that had been present in the previously running BMS system.

The take-away? A DR solution provides a strong safety net that can protect operations if the primary infrastructure is running in a location that is vulnerable to natural disasters. But, a DR solution should be built in addition to a solution designed to ensure the high availability of the BMS. Events such as software bugs, hardware failures, and human error happen far more frequently than the natural disasters for which a DR system is built, and an FT or HA configuration is designed to deal with these events rapidly and in real time.

On-premises or in the cloud?

From an availability perspective, does it matter whether the BMS is deployed on-premises or in the cloud? Not really. The sensors, controllers, and devices (such as surveillance cameras or card readers) that are deployed throughout the property will pass data to — and respond to instructions from — the BMS software regardless of whether it is running on-premises or in the cloud.

That said, there are practical matters to consider when it comes to protecting a BMS solution from downtime. The steps to protect an on-premises BMS can differ from the steps to ensure the same availability in the cloud.

On-premises considerations

If you’re planning to run your BMS solution on-premises using an HA infrastructure to guarantee availability, you’ll want to configure your BMS infrastructure as a failover cluster with at least two compute nodes, each running in a physically separate location (so that an event that takes down the infrastructure in one location does not affect both failover cluster nodes). The clustering software monitors the health of the BMS system, and, in the event of a failure, the clustering software “fails over” to the instance of the BMS running on the secondary node. In a traditional on-premises cluster configuration, both nodes would be connected to shared storage (typically a SAN), so the instance of the BMS running on the secondary node would encounter no loss of data. The BMS on the secondary node would simply access the data on the shared SAN in the same way the BMS on the primary node had.

The use of a SAN, though, does create a single point of failure vulnerability. If the SAN fails, it doesn’t matter how many failover cluster nodes there are to support the BMS because the data required by the BMS isn’t available. To eliminate that vulnerability, you could create a SANless failover cluster. In a SANless cluster, each cluster node has its own local storage, and the SANless clustering software replicates data among the nodes in real time to keep the nodes in sync. If the BMS system fails, even if it simply appears to go offline, the SANless clustering software causes the failover cluster to fail over to the node standing by in the second data center. Because the data in storage on the secondary node is identical to the data on the primary node, the BMS solution on the secondary node can take over and support building operations with minimal interruption.

Cloud considerations

From an application protection perspective, the cloud offers certain distinct advantages over an on-premises deployment. You don’t need to worry about acquiring the hardware, data center real estate, or support personnel to manage and administer the infrastructure. The cloud service provider takes care of all of that on your behalf, and their operations are optimized to perform those tasks much more cost-effectively than you could.

FT infrastructure in the cloud will still cost far more than HA infrastructure in the cloud, and it’s only available through private cloud offerings sponsored by FT hardware providers. Either way, though, it’s the service provider’s responsibility to spin up the systems that will support your BMS solution. Particularly if you’re using standard system configurations to design an HA infrastructure on AWS, Azure, or GCP, those services can spin up virtual machines to your specification in moments — and resize them instantly if your future needs demand a more powerful configuration.

Since cloud service providers operate their own data centers, it’s easy to configure an HA failover cluster with VMs in separate cloud availability zones. You also gain 24x7 administrative services at a far lower cost because you don’t have to hire your own teams of system administrators who can work 24x7. The teams supporting the cloud data centers are working 24x7, but because the cloud service providers can spread the cost of those teams across hundreds or thousands of clients, they can provide the same level service at a far lower cost to each client.

One thing to think about when looking at the cloud, however, is that not all cloud providers can provide an HA configuration for your BMS that relies on shared storage. If they can’t, of if they can’t offer a shared storage option that meets your performance needs, that's OK. You can easily configure a cloud-based failover cluster using SANless clustering technologies and that will ensure the HA of your BMS in any of the public cloud environments.

The bottom line

In the end, the decision whether to run your BMS in the cloud or on-premises, on infrastructure configured for HA or for FT, comes down to your budget and your availability needs. FT configurations can provide the highest levels of availability, but at the highest cost — and they still can’t protect you from software bugs or human errors, both of which can bring down a BMS. An HA configuration may provide an acceptable level of BMS availability at a far lower cost. 

On-premises BMS solutions, whether configured for FT or HA, can be more costly than cloud-based solutions. The on-premises deployments require deeper investments in hardware, software, and personnel; they also can be more costly to evolve over time if the system needs to be updated. Cloud-based deployments, particularly when built on standardized configurations configured for HA, can be deployed rapidly, scaled easily, and maintained without the need to hire additional personnel. On-premises BMS solutions may provide high performance and the tactile satisfaction that accompanies infrastructure that an IT team can see and touch. A BMS solution running in the cloud can provide equally high performance and, from the perspective of a building administrator who may be interacting with the system on a day-to-day basis, behaves identically to one that is sitting on-premises. The only real difference is that the building administrator working on a cloud-based BMS can’t just walk into the next room and feel the server humming along when suddenly seized by an impulse to do so.