Battling The Zombie Servers
Zombie servers are a threat to data center security and productivity.
No longer relegated to B-movies, zombies are everywhere these days. Celebrated as the anti-hero, zombies are no longer viewed as a menace to society. That is except for the data center world, where zombies remain a threat to security and productivity.
Within our industry, the term “zombie servers” has ominous meanings. In one instance it can refer to a server that has been taken over by outside forces with the specific purpose of doing harm to other computers such as denial of service (DoS) attacks. In the other, the server sits idle, serving no purpose yet consumes data center space and power. Although the latter seems a little less threatening, the reality is that it’s the most dangerous zombie of all. Hidden in plain sight, these zombies are draining valuable resources from data centers across the globe. It’s imperative for those responsible for the efficiency of their company’s data center to rise up and slay these idle servers.
There are multiple reasons why servers turn into zombies. But for the most part, they grow out of a simple oversight.
A project requiring a new server was racked, cabled, and powered up, but then the project suddenly halted before the server was fully configured to perform the intended job.
A server was used once for an isolated project and after project completion the server was no longer used, but still left in the server cabinet waiting to be repurposed. It is a fairly common occurrence that a server is never repurposed and becomes abandoned.
The departments or individuals that once used a server have left the company (aka churn). With little or no documentation left behind explaining what it was used for, neither new employees nor seasoned veterans know what to do with these legacy servers. In fact, there may be no awareness these servers even exist. Because of this, no one dares to take any action for fear of causing an outage to customers. These are just a few reasons why zombie servers crop up but because of this it comes as no surprise that 20% to 30% of servers within a given data center are sitting idle and serving no purpose.
Idle servers waste space, power, and cooling, which distorts a data center’s requirements, which ultimately wastes money and leads to unnecessary actions such as prompting data center expansion. Exposing and removing zombie servers from a data center will free up space, extend the life of the existing data center, and delay or possibly avoid the need for a multi-million dollar CAPEX expenditure.
Space, power, and cooling are a few obvious reasons to get rid of zombie servers, but there are many more reasons as well. Unnecessary added weight is one of them. Not only do idle servers add unnecessary weight within cabinets, they also add weight to the structural floor. In certain data centers, space, power, and cooling are not the areas of concern but instead, weight is the issue. In some cases, the original design of a building was for offices only and only a small space designated to meet IT needs.
As a company’s IT needs grow, offices or conference rooms are often converted to server room space. Weight may become a concern if these server rooms are created on a second floor or above because heavy IT equipment stacked within server cabinets was never originally designed to live on these floors. In these situations, removing all extraneous IT equipment may alleviate weight concerns.
Beyond space, power, and cooling, another concern is capacity. Freeing up capacity by removing idle servers delays the need to purchase patch panels, switches, or install a new network infrastructure. As a server cabinet fills up, power cords may not always be neatly plugged in making it difficult to recognize which cord corresponds to which server. Removing zombie servers frees up more outlets enabling data center technicians to untangle crossed power cords and re-plug them into outlets near each server.
Another benefit to freeing up ports and power cord realignment is fixing poor cable management — a major contributor to poor airflow. The majority of cabling within a server cabinet is in the back of the cabinet where most IT equipment’s hot air is released by fans. Clearing out the zombies and their accompanying data and power cables increases airflow and allows the heat to properly eject out of cabinets vs. being trapped within the server cabinets and raising internal temperatures and the risk of equipment failure.
Pulling out zombie cables extends beyond the back of the cabinet. Clearing out unwanted cable from overhead cable trays reduces excessive weight that can cause sagging trays. Removing piles of “dead” cable also allows for easier tracing of existing “live” cables during troubleshooting. Dead cable under the floor can obstruct airflow, which prevents IT equipment from receiving the cold air needed to operate effectively. It is also critical to remove underfloor cabling to increase airflow. Too often, organizations rely on the train of thought that there can’t be that much dead cable, and cleanout projects are overlooked. However, we have seen time and time again multiple barrels of dead cable hauled out of data centers.
The reasons for getting rid of zombie servers mentioned so far are facilities-related. However, software licenses are also a significant IT-related concern that needs resolved. As zombies are eradicated, their software is no longer needed and may now be available for use in “alive and well” servers. Depending on the cost per server, large amounts of money can be saved if particular software licenses are no longer needed altogether.
Once these challenges have been addressed, the next step is to identify the zombies among the living. There are multiple ways to do this but the most effective way is to do them in combination. Here are a few methods we have found effective in identifying the zombie servers.
Walk throughs. It may seem basic but it works to physically walk through the data center with a clipboard and pen in hand. Open up each cabinet and look for servers that are running without connected cables. If these servers are found, write down the cabinet number, server name, and serial number (if visible). While searching cabinets, look for servers that have ports lit up either amber or completely dark. If no ports are lit up green and there are no signs of sending and receiving data, this is most likely a zombie server to add to the list.
Leverage existing monitoring tools. If CPU or power monitoring tools are available, use these to run reports of all servers that are utilizing zero or near zero percent of their maximum capacity over a period of time. Another tool, data center information management software (DCIM), combines multiple brands of power monitoring tools into one easy-to-read report. These underutilized servers in a report are likely zombies providing no value to the organization yet are wasting valuable resources and money. Additionally, servers found in these power utilization reports running at a low percentage rate of 15% of capacity or less require closer scrutiny as there is a strong chance they may be underutilized and not maximizing their full potential.
Establish a shut down and removal process. Once the list of zombie servers is complete and all stakeholders have confirmed the zombies, employ a pre-determined process to shut down and remove them properly. Typically, a company requires changes within the IT environment to be pre-approved using IT service management processes such as change management. This is a good practice to follow as it requires that the customer or support team for that server to acknowledge the server status and makes it crystal clear that key stakeholders agree to the shutdown. This change request should include specifics on the shutdown such as how it will be removed and whether it will be repurposed or recycled. This formal process is tedious and often time consuming but in the long run it eliminates the risk of customer-facing outages and provides necessary record updates for the assets in question.
When dealing with zombie server shutdowns it is important to double check that no ports have suddenly changed to green prior to removing data or power cables. When removing the servers, use two people or a server lift to remove them from their cabinets regardless of their size and weight. Even removing a fairly light weight server takes a team and should not be attempted by one person alone. It is important to avoid the possibility of a dropped server as that could also damage IT equipment below. An extra set of hands or a server lift will mitigate this risk.
Once the server is removed, use a cart to transport it to the recycle location or back in inventory to avoid the risk of dropped servers. After removing the zombie servers from cabinets, seal up the remaining voids with blanking panels to prevent hot and cold air from mixing, which decreases data center efficiency. If recycling servers, shred through a reputable IT recycling company. These companies provide certificates of destruction that prove no data can be restored from the devices. Not following this process could put your company at risk for data breaches if the device was found in a landfill.
Zombie servers may seem non-threatening as they quietly lurk around most data centers. However, they consume valuable resources (space, power, cooling, and ports) and add excessive weight and block airflow, which threatens available capacity and reduces data center efficiency. The battle against zombie servers is a worthwhile effort that leads to a more efficient data center.
Zombie Server Math
Let’s assume you have 50 server cabinets holding an average of 10 2U zombie servers that each waste around 250 watts each.
50 cabinets X 10 servers X 250 watts = 125kW of wasted energy. For example, let’s say 8 cents per kWH, this would equal $87,600 of wasted money per year!
This would require more than one 30 ton CRAC unit to cool these zombie servers. Of course, this additional CRAC unit would need its own power which only adds to the total wasted energy. In this same 5,000-sq-ft data center imagine if half of these zombie servers had both A and B copper connections. This might add up to 500 ports consumed on patch panels and/or switches.