Controller-Based Versus Controllerless
The debate over next-generation SDN solutions
Software-defined networking (SDN) has been around long enough that it faces a paradox: Everyone thinks they know what SDN is, but many don’t actually realize how far SDN has evolved from its origins and how varied SDN solutions have become.
International Data Corp. (IDC) gets to the heart of the matter in its 2018 IDC Innovators report, calling SDN “an architectural approach to data center networking in the cloud era … to support digital transformation, data center networks must become agile, both architecturally and operationally. They must possess the intelligent automation that will make them ‘cloud-like’ and increasingly autonomous.”
In other words, SDN is a set of principles tied to real-world goals, not a specific technology.
Despite a maturing market, there is no single path to the software-defined future of data center infrastructure. In fact, there are several considerations, perhaps none as critical as the decision to go with a controller-based versus a controllerless architecture.
A controller acts as the brains of the software-defined network, serving as a strategic control point. A traditional controller runs on external servers — typically three are deployed for redundancy — and holds a centralized view of the entire state of the network. The controller typically connects to the packet forwarding nodes via an out-of-band (OOB) management channel. A controllerless approach also has a centralized view of the entire state of the network but distributes that state and intelligence across all switches — in other words, each node in the network has a view of the network’s full state, not just its neighbors.
Among the challenges that have inhibited widespread SDN adoption is the complexity associated with controllers and “open” yet proprietary protocols, such as OpenFlow, that require significant changes to network architecture and operation. In approaches built on the centralized controller model, certain vendors have implementations in which the switch nodes have no intelligence, and their forwarding tables are programmed with a protocol that has to traverse the OOB network from the controller to the switch.
One obvious challenge in these approaches is that the OOB network equals a single point of failure. Plus, most switches only have one OOB management port — another single point of failure. If the controller is some distance from the nodes, then this can result in slow reconvergence and slow processing of new flows — factors that generally limit the ability of this sort of architecture to stretch the network geographically.
Vendors that support in-band communications between the nodes and the controller propose an architecture that has a minimum of one controller per site, making the design complicated and expensive. Additionally, a central controller can only manage so many nodes, which can become expensive if multiple controllers have to be deployed, with each deployment requiring three servers for redundancy. Finally, in the case of multiple controllers, many controllers do not have the ability to federate with one another, resulting in network islands.
Controllerless SDN Architecture
A controllerless SDN architecture also enables a view of the full network state, but that full state information is replicated across all nodes in the switching fabric, even across geographically dispersed sites. In this model there are three layers to consider:
- The management layer, where all switches federate together to build a management fabric, sharing the full state of the network with every other switch. The entire fabric can be controlled from the CLI or via an API on any switch. This management plane fabric will automatically populate any configuration or state changes to all other switches in the fabric.
- The underlay, which provides physical connectivity and IP reachability and is based on standard Layer 2 and Layer 3 protocols. This enables the fabric to be built across a set of leaf nodes only (top-of-rack DC switches) while using standard protocols to interoperate with existing spine (core DC switches) from any vendor, enabling a smooth migration to SDN in brownfield scenarios. In greenfield scenarios, both leaf and spine can be part of a controllerless fabric.
- The VXLAN overlay, which virtualizes the underlay to support any-to-any virtualized network connectivity with secure network segmentation between all nodes in the network.
While the control plane of a controllerless SDN network can use an OOB network connection, in a more typical deployment, the control plane runs in-band using the same VXLAN tunnel overlay mesh as the data plane, which means there are multiple paths and multiple ports on every node to reach every other node — no single point of failure. If one switch fails, all other switches can continue to operate, update network state, and quickly reconverge. Since there is no external controller running on top of three servers for every N switches, the cost and complexity of the external controller is eliminated. Also, since the full state is held in every switch and all services are distributed throughout the fabric via objects like an IPv4/IPV6 anycast gateway present in every switch, there is no problem deploying sites that are widely dispersed geographically.
The overall advantages of a controllerless architecture fall into five categories.
- Operational Simplicity — Next-generation SDN delivers operational efficiencies by federating an organization’s array of switches to make them appear as one logical switch. As such, an organization with multiple switches deployed in a distributed environment — inside one data center, across a campus, across cities, or across the ocean — benefits from the ability to view and manage the fabric as one logical entity. This provides a level of operational simplicity that dramatically reduces both operating costs and the potential for human error.
- Enhanced Resiliency — In the distributed controllerless approach, a switch that goes out of service has a very limited impact on the overall fabric control plane functionality and no effect at all on the data forwarding plane, making it a very robust design. If there is a switch out of service and a configuration change is requested, none of the switches will accept the change until the misbehaving switch is ejected from the fabric or recovered, ensuring consistent configurations across the fabric. Areas of the fabric can run and be managed in isolation from the rest during severe connectivity disruptions, gracefully rejoining and regrouping as a whole when connectivity recovers, making it a flexible solution during times of crisis. Finally, since the nodes are still running standard protocols, decision-making in the case of topology changes is rapid, providing very low reconvergence times, which is key in today’s demanding application environments.
- Deep Visibility — Fabricwide visibility to all attached devices and down to the TCP flow level simplifies troubleshooting. The ability to troubleshoot the entire fabric from any switch and even go back in time and drill down on a particular flow between two end-points can simplify and accelerate troubleshooting, further simplifying operations and improving security.
- Network Slicing for Security and Services — The ability to slice the fabric for multitenancy with complete isolation of the management, control, and data planes enables an excellent security posture, especially in light of the rise in IoT traffic. Untrusted traffic that can bring a large attack surface on shared infrastructure can be completely isolated from more valuable traffic. This slicing can also be used in multitenant virtualized environments to provide full control and rich services for each tenant.
- White Box Economics — Optimally, this sort of solution is available to run on open networking solutions, which can be brite box or white box switches. Thus, in addition to simplifying network management and increasing reliability to lower operational costs, capital costs can be reduced on the order of 40% to 50% compared to traditional switching solutions, while preventing vendor lock-in. White box switches are also typically 1RU, achieving a similar scale-out design to the larger hyperscalers including a fabric.
Why Controllerless SDN Works
A controllerless approach to next-generation SDN represents a beneficial way to transition to the software-defined data center of tomorrow. Organizations that make the move to SDN without a controller are experiencing a dramatic reduction in costs and complexity and gaining the operational flexibility that comes by federating a large number of switches as a single programmable entity.