In a world where data has been compared to oil or gold, one might expect that the more you have of it, the better. That is where the analogy is faulty, because if you think about it, we already have plenty of data and it is not more data we are looking for — instead we are searching for the precious data, much like mining for diamonds in the coal. Therefore, one might even compare data to diamonds — as today, organizations are struggling to extract the “diamonds” of data within the vast data centers that house petabytes of erroneous rocks of invaluable material.

Data lives among us in abundance and by 2025 IDC predicts we will churn out 175 zettabytes of data — a number much like our national debt ($22 trillion) that is hard to even fathom. As we connect more IoT devices like video surveillance cameras and autonomous cars or “anything” that can churn out data 24/7, they can cause bottlenecks and delays for organizations. These huge data sets are where we are trying to extract only diamonds (10%) out of the coal (90%) of the data created each day, and is impossible to mine with current technology.

And that is where non-volatile memory express (NVMe) has come in to provide a measure of relief in this data heavy world. NVMe is a new streamlined and flash-focused interface that operates at a much higher interface for modern flash solutions. As such, NVMe removes existing storage protocol bottlenecks for platforms churning out terabytes and petabytes of data on a regular basis.

But, is that enough? Let’s take an example of a traditional solid state drives (SSD) that plugs into a storage server slot: NVMe can support four lanes of traffic which equates roughly to 3GB per second. The ability to move 3GB per second in and out of a drive offers a huge advantage compared to legacy drives such as SATA — which does not even move data at 1GB per second. However, when the drive size grows, even the 3GB per second can become “slow.”

Even though NVMe is substantially faster, storage architects have overlooked what can be changed to truly help manage their mines of data. Today, many organizations are selecting to deploy “NVMe Over Fabrics” (NVMe-oF), which integrates the NVMe technology over storage networks for longer distances and scale-out environments that disaggregate the storage from compute.

But the fact is, not even NVMe is fast enough by itself when petabytes of data are required to be analyzed in real time. For instance, if a 16TB SSD full of data is being used, then 3GB per seconds divided into 16TB can be a challenge, especially when you have many per system.Well, with that said, it will take hours to move, mine, or analyze all that data.

But there’s an upshot: NVMe drives take hours, not days like legacy serial ATA (SATA) solutions which can be considered a big win. This is because we still treat storage as just a place “to store bits” instead of as a value-added vehicle to extract important data for analysis to keep trains and planes and medical results moving and coming in on time.

 

Computational storage

Since NVMe by itself doesn’t solve the true bottleneck issue when there is a large piece of data sitting behind it — what does? That is where computational storage comes in and solves the problem of data management and movement.

Pretend for a moment your storage is a dynamic machine that intelligently organizes raw information into meaningful data. Computational storage is just that: it allows an organization to ingest as many bits as possible and churns out just the right information on command and in real time at the storage level instead of in the CPU.

This approach is especially essential with high-capacity NVMe SSDs that require help to manage their data locality and storage compute needs. This is where adding value to the storage and allowing it to do the work by itself is key to the future of our new most valuable resource — our data. Computational storage increases efficiency via in situ (within) processing for mass datasets, which reduces network bandwidth and is ideal for hyperscale environments, edge processing, and AI/data applications.

Now let’s take a look at what this means and put it into a real-life example.

Have you ever been on a plane that was “delayed for paperwork” or had “issues with weight and balance?” That is because passengers are waiting for the raw data to become relevant with current storage technology. An example of this is Embraer Air, which states it can take up to three hours to process the 6TB dataset per plane, and they only have small regional jets. A 787 Dreamliner jet must contend with 400TB per plane — wonder how long that takes?

Storage Switzerland points out that without the aid of computational storage, direct attached storage is often the approach IT teams are using to avoid storage network latency and to increase throughput by spreading the data across many devices. But even with NVMe-oF, where these latencies have been reduced or managed, the storage is still going to be a bottleneck.

As such, computational storage enables more robust processing power to aid each host CPU, allowing an organization to ingest all the data it can generate but only provide what is necessary, therefore keeping the “pipes” as clear as possible. This allows for when raw data is needed for analytics and organizations have the freedom to only pull out what is needed vs. having to deal with the entire data set.

Another benefit of computational storage is that it makes a shared storage environment more beneficial to organizations — feeding the performance-hungry workloads. So even if deployed in an NVMe-oF, composable, or other architecture, the localized compute offered is of great value. In addition, as you add more computational storage devices, the compute offered by each device is scalable and seamless.

Today’s storage architects must not look at data throughput as just physically moving data but rather how to intelligently organize it. For companies with large data sets, computational storage devices process the data in situ and solves issues in ways that avoid delays and bottlenecks — mining the diamonds quickly and in real time, allowing the enterprise to shine with perfectly analyzed data and maximizing on the return and profitability.