Cloud computing has become a common paradigm for businesses of all types and sizes, but when most of them think of cloud, they think of public cloud providers like Amazon, Microsoft, or Google. While it’s true that businesses can benefit greatly from cloud computing, however, many don't want the cost, performance, and governance concerns of public cloud. That leaves them with the option to build an on-premises or private cloud. But building a private cloud has always been a complex, costly, and time-consuming process, and many companies can’t or don’t want to acquire the cloud-building expertise necessary. Now, cloud infrastructure vendors are beginning to use automation to offer self-driving clouds, and these greatly reduce the overhead of deploying and operating a private cloud. In this article, we’ll look at the requirements for self-driving clouds.

From installation to long-term planning, many cloud management tasks can be automated to create a self-driving cloud.

Article Index:


Automatic Installation and Configuration

Simply installing a cloud can be a complex process. One must assemble the necessary servers, storage, and networking resources, and then implement an operating system and cloud software. Wouldn’t it be nice if there was no need for integration or the idea of a “Day 0” went away? Self-driving cloud vendors are starting to implement cloud software that is pre-installed into the operating system image, so that once a server is deployed and powered on, the cloud should come up automatically without IT administrators having to know anything about various services and their persistent stores. The image software should pool together servers, storage, and networking resources to create a highly resilient cloud. Ideally, the user should be able to install a cloud and have it up and running in less than 30 minutes.


Integration with Other Clouds and Internal Systems 

Clouds are not designed to work in isolation, so users should be able to quickly connect an on-premises cloud with existing virtualized infrastructure and other public clouds. Ideally, the cloud should allow migration of workloads to and from public clouds so users can “cloudburst” onto public cloud when they need to scale quickly. Another form of cooperation with existing infrastructure is the ability to add existing storage systems and make them part of the cloud through open (i.e., RESTful) APIs. Similarly, most users want to integrate with AD/LDAP as well to have a single source of users and authentication.


Self-Service Application Deployment

The goal for any cloud is to enable various teams to access cloud resources themselves through a point-and-click interface. For example, developers could use this facility to access application development tools, support teams could use it to bring up replicas of customer environments to troubleshoot any support issues, sales could bring up quick PoCs for customer demos, and IT could bring up staging or production deployments of various applications. These steps need to be fully automated, so that one can repeat them without spending too much time. Any cloud solution should provide a self-service interface with pre-built application templates for quick deployment.


Real-Time Monitoring 

To reveal the state of applications and what actions other users have performed, the cloud should be able to monitor events, statistics, and dashboards in real time. IT should be able to get logs and audit the actions of all users. For example, if a service was down since 10 p.m. last night, it is good to know if a user or script mistakenly shut down a VM that provides that service.



Any system as complex as a cloud needs to monitor critical services and help monitor workloads. Companies can spend a lot of manpower resources to perform this function manually, but a self-driving cloud can monitor and heal itself. For example, if any hardware component or software service fails, the system should detect and fix the situation. Then, it can alert the admin about which component had failed, so the admin could take corrective action to restore the capacity of the system.


Machine Learning for Long-Term Decision Making

The self-healing layer takes care of short-term decisions, but IT administrators need another layer of automation that can observe the cloud and applications over a longer period to help optimize the cloud, improve efficiency, and plan for the future. A self-driven cloud platform collects telemetry or operational data and leverages machine learning to guide data scientists as to how to develop algorithms that now model this behavior. The algorithms help customers make decisions.

This machine learning layer should observe cloud usage to do predictive capacity modeling, recommending orders for new servers, for example. It should also determine what sort of servers to add in terms of their CPU, memory, and IO ratio. For instance, if the applications are more CPU-intensive, one should order servers with more cores and less storage. Another area is to help optimize the size of VMs based on utilization.

A learning system can also help users detect any anomalies in your environment. For example, you might notice that suddenly a VM was sending a lot of data to other public IPs because of the machine getting hacked by a bot. Any such security risk can be detected using a smart anomaly detection system. The list of machine learning-based algorithms can get long, but the key is to have a platform where these can be easily added over time.


Hands-Free Upgrades

Upgrading a cloud is like changing the tires on a running car. With a live cloud running a variety of workloads, it is critical that the upgrade process be completely handled by an intelligent software layer, and not by humans who are reading release notes from vendors to figure out the right path to upgrade for their environment.

By meeting the above criteria, vendors can create self-driving cloud platforms that are easy to install and configure, easy to operate, and easy to manage. As enterprises come to trust the artificial intelligence that enables self-driving clouds, they can implement private clouds that deliver the self-service benefits of cloud computing without the complexity and cost it has traditionally required.