Monitoring Averts Catastrophic Computer Room Meltdown
by Bob Douglass
November 1, 2009
Salient
moves from third-party vendor
For
much of its 23-year existence, Salient Corporation, a software
development firm headquartered in Horseheads, NY, relied on a
third-party monitoring service to protect its computer test lab from
changes in temperature that could lead to costly damage and downtime.
But that system couldn’t prevent a situation where the air
conditioning system failed and the interior temperature rose to about
130 F, causing more than $25,000 in equipment damage. Afterwards,
Salient realized it needed a more dependable in-house solution to
avoid similar losses in the future.
As part
of its service, the monitoring provider would contact the IT staff
when conditions warranted, and Salient’s team would take action.
But the service’s capabilities were limited to monitoring
temperatures, not other environmental conditions, and it would only
contact Salient’s staff after an alarm had sounded. The service
alerted them to a high-temperature alarm, but it was unable to
pinpoint the current temperature. Although it was a simple
arrangement, in some instances Salient felt the information available
from its provider was not enough.
In
early 2007, the failure of the air conditioning system exposed the
flaws of the monitoring service, and as a result, Salient saw a
significant price tag for repairs. Founded in
1986, Salient Corporation develops business management software for
its more than 300 corporate and government clients, including
multiple Fortune 500 companies. Salient has more than 35,000 users in
53 countries. The company’s headquarters houses a computer-testing
lab where developers load-test new software and its functionality.
The company’s mainline computer system is located near the testing
lab but in a different room that operates separately. Both of these
rooms require monitoring. “Our backend
engineers use the lab all the time to test our software applications
being developed,” said Rodney Hall, infrastructure manager for
Salient. “It’s a valuable piece of real estate for what we do.”
The
lab holds more than 250 pieces of equipment, including more than 50
servers, which Hall refers to as the “backbone to the entire
testing system.” It is no wonder that Hall reflects on what
happened in early 2007 as “catastrophic.” Late
one Saturday, the testing lab’s dedicated air conditioner failed,
and the second air conditioning unit failed as well. But Hall and his
colleagues did not receive a call from the monitoring service until
the following day. By the time they arrived to respond, it was too
late. “We came in on a Sunday and the room
temperature was well over 130 degrees,” Hall said. “The heat
damaged many machines, melted some of the machine casings, fan
shrouds, and even unsoldered chips on memory boards. It was boiling
hot.” The room’s heat even caused the
thermometer to melt, permanently displaying a temperature well above
its 120 F limit. Hall estimates the company
lost more than $25,000 from that single incident. “That’s the
bare minimum. To this day, we’re still discovering components that
were damaged in some way. It was absolutely
catastrophic.” The company learned from that
experience and was able to avoid a similar outcome during recent
trouble with its air conditioning units and power supply. Now Salient
relies on a more effective remote monitoring system as its first line
of defense. Hall and three members of the IT staff supplemented the
outside service with the IMS-1000 infrastructure monitoring system
from Sensaphone. Hall’s team first spotted
the IMS-1000 at the 2007 Interop tradeshow in Las Vegas. “We wanted
more effective coverage, and even with its simplicity, this unit
gives us that,” he said. “A lot of the other options are for
major infrastructures. We run a lean organization and $100,000 is a
big investment to us. This unit is in our price range and does
exactly what we wanted it to.” Because the
IMS-1000 combines environmental monitoring, physical security,
network monitoring, and data logging into a single system, Hall can
monitor much more than temperature change. The stand-alone unit
features an internal battery backup system and a variety of alarm
delivery options that work independently from a computer network. The
IMS-1000 has an internal Web server and is also fully
SNMP-manageable. With the IMS, Hall doesn’t have to pay a monthly
fee.
The unit uses eight external
sensors to monitor the primary environmental culprits leading to
server malfunctions, including temperature, power, humidity, smoke,
fire, and water on the floor. TCP/IP port service monitoring exists
for up to 16 network devices or port services, generating ping
requests and verification of services. Environmental sensors that
identify unresponsive network ports or detect conditions exceeding
set ranges initiate an alarm notification process. IT departments can
also select built-in phone modem and voice communication options.
At Salient, the unit took Hall less than 45
minutes to install, and it almost immediately began returning
dividends. Hall installed the IMS in June, and it issued its first
alert over the July 4th weekend. “It paid
for itself right then and there,” Hall said. “On July 5th we lost
a compressor for an air conditioner in our main server room. The
temperature had gone up considerably, but I received a notification
in time. I came in and shut down a bunch of servers that we did not
need running, and that helped the room return to the proper
temperature. We had the problem taken care of before the monitoring
service even had to call us. The Sensaphone alerted us well in
advance of it becoming a major problem.” In
late July, a second incident occurred—this time a power outage. “We
kicked a circuit, and it shut down our entire infrastructure. The
switching went down and so did our Internet connection. It was 10:30
on a Monday night when I received an alarm notification. I ran back
to the office, reset the circuit, diagnosed the problem, and fixed it
to bring everything back up. We had about 20 minutes of downtime
versus coming in the next morning to discover it had been out all
night.”
Even with his recent experiences
aside, Hall said the Sensaphone remote monitoring system outperforms
the existing outside monitoring service. The IMS-1000 calls the
contact numbers for alarm condition, and it keeps calling until
someone responds. “The service calls everyone on the list, which
means several of us may respond to the same alarm — not exactly
efficient.” Another advantage, Hall said,
is the ability to log in to the IMS-1000 unit and proactively monitor
the conditions. If a situation is serious, the system allows Hall to
log in and remotely shut down systems if necessary. “I have more
control over everything the Sensaphone monitors,” Hall
added. He went on to add that the Sensaphone
IMS-1000 has helped the company recover more quickly from the
previous disaster, allowing senior executives to concentrate on
what’s most important — growing the business. Hall said he fully
expects to install the IMS system in a new computer room currently in
the planning stages, entrusting three rooms total to Sensaphone’s
care.
|