Cold Storage Heats Up The Market For Cloud Backup
Is this new storage option right for you?
Cloud is having a profound impact on all aspects of IT. Everything from how infrastructure is acquired and managed to how applications are written is changing from how they’ve been done for over a generation. One of the clear killer apps for cloud is data protection and disaster recovery (DR). The availability of abundant, cheap storage has made it possible for businesses to retain more data off-site than ever before. The elasticity and dynamic provisioning of cloud resources also brings true DR capabilities to small businesses who previously could never have afforded a secondary DR location.
Most cloud providers offer both block storage as well as object storage. For example, Amazon offers Amazon Elastic Block Store (EBS) and Amazon Simple Storage Service (S3). Block storage is typically used with traditional enterprise workloads that require persistent storage. Block storage is utilized by these applications just as they would consume storage from a storage area network (SAN) device. Files are split into evenly sized blocks of data, but have no metadata associated with them.
Object storage stores data and files as discrete units without any hierarchy. Each object contains data, an address, and any associated metadata useful to the application that is using the object storage. One advantage of object storage is that it is highly scalable as well as highly durable. Most object storage systems have mechanisms to duplicate data and deliver up to 11 nines of availability and durability.
Object storage can be effectively used for cost-efficient cloud backup, archiving, and DR. Data stored in object storage is easily and quickly accessible. The main limitation and challenge of using cloud storage for backup and recovery is in the initial seeding of data into the cloud. A typical data center has many terabytes of data on premise and moving this to the cloud can be a daunting task. Even with a high transfer rate wide area network (WAN) pipeline, it can take days to transfer this large amount of data.
Likewise, when having to restore from the cloud, WAN bandwidth can be a limiting factor. As a result, many cloud services offer cloud seeding programs that allow enterprises to ship physical disk or tapes to the cloud facility so data can be seeded into the cloud for the initial set of data. From that point on, any changed or new data can be transferred over the WAN. This incremental forever backup strategy requires a much smaller amount of data to be sent over narrow transfer pipes.
Most data in a data center cools over time. This means that it starts out “hot” and is accessed frequently. However, over time most data is accessed less and cools to the point where it is cold, accessed very infrequently and kept mostly as archives for data retention or compliance needs. Cold data must be retained in many enterprises for seven to 10 years for compliance and regulatory reasons, but with the expectation that it is accessed extremely infrequently. Typically this data has been saved on tapes and stored off-site. However, while individual tapes are inexpensive, tape is a much maligned technology in IT. Tapes often require expensive hardware systems that must be maintained, and the process for storing data for long-term retention using tapes is labor-intensive and potentially error-prone. Worst of all, restoring from tape can frequently fail as tapes age and bit-rot occurs.
As a result of this data pattern, there is a new form of cloud storage that is heating up. In 2012, Amazon introduced Amazon Glacier, which provides durable storage for data archiving and online backup for as little as one cent per gigabyte (GB) per month. Glacier was designed specifically for infrequently accessed data that can be tolerant of a retrieval time measured in hours, typically three-to-five hours, as opposed to seconds or milliseconds, which is typical with object storage.
Google entered the cold storage market in March 2015 with a potentially disruptive solution of its own. Google Nearline Storage is also intended for infrequently accessed data and has a cost of one cent per GB, which is less than half of what standard storage costs at Google. However, unlike Amazon Glacier, which will deliver your data within a service level agreement (SLA) of three-to-five hours, Google promises file access of three seconds or less “time to first byte.”
Cold storage can be a serious replacement for tape for many enterprises who no longer want to use rotational media for archival and retention. It seems like Google Nearline is an absolute slam-dunk against Amazon Glacier, offering faster retrieval times and a much lower cost than other cloud-based object storage options.
So what’s the catch? Google Nearline is still designed for cold storage. Google states that you should expect 4 megabytes (MB) throughput per terabyte (TB) of data stored (although this throughput scales linearly with increased storage consumption). For example, 3 TB of data would guarantee 12 MB per second of throughput. Therefore, while Google Nearline is faster to first byte than Glacier, if you have a lot of data, Glacier may deliver the entire set of data faster overall. Also, because Nearline is still in beta, it is not yet covered by any SLA or deprecation policy. Both Glacier and Nearline can incur additional costs as you access your data depending on how frequently and how much of your data you need to access.
Object storage and cold storage are both viable targets for your backups and archives. Choices abound from numerous hyper-scale cloud companies including Amazon, Google, Rackspace, IBM, Microsoft, and others. Your backup solutions vendor may also provide their own cloud storage solution that may be completely integrated out of the box — therefore highly easy-to-use — and may include built-in DR options for spinning up virtual machines.
So which cloud storage solution is right for your backup and DR needs? The answer depends on several factors including ease-of-use, cost per GB, SLA guarantee requirements, time to first byte retrieval, overall throughput expectation, frequency of access of data, and initial data seeding requirements. The right answer will depend on your business’ needs. Your best option is to look for a backup and continuity provider who can support the broadest range of options available to you. Flexibility and choice will provide you with the ability to go with the most effective and cost-efficient method to meet your needs in this evolving market.