PALO ALTO, Calif. — Brian Bulkowski, CTO at Yellowbrick Data, offers his insight to the mission critical industry with these five predictions about the data center market and what it will look like in 2020.
- Big data is well and truly dead, but the data lake looms large. Large scale, feature rich data warehouses — in the cloud and on-premises — have improved radically to provide multi-petabyte scale using MPP architectures. That scale is made effectively possible by pushing compute and data closer together and allowing SQL to express join semantics and aggregations, which can be optimized by the database. These factors have killed big data as we knew it, but one element of it lives on — the data lake.
- Best-of-breed cloud is coming — under the name of hybrid. Public cloud vendors have extortionately high prices. The public cloud makes sense for small- and medium-sized businesses. Those businesses don’t have the scope to amortize their engineering spend. Public clouds don’t make sense for technology companies. Companies like Bank of America have gone on record as saving $2 billion per year by not using the public cloud. A best-of-breed architecture envisions building blocks within the technical stack, then selects, not from a single cloud vendor but from a variety of service providers. Assumptions that a given cloud provider has the lowest or best prices, or that the cost of networking between clouds is prohibitive, becomes less and less true.
- Data exchanges must evolve to data services. While industry-specific data exchanges have been around for a decade, data sets that can be trivially loaded into a database using cloud methodologies at the click of a button seem exciting and new. The industry needs lower friction means, more standardization, and cryptographic tools (such as blockchain). These are not supported by today’s data exchanges. Until these features are added, data exchanges will languish.
- Database innovation will be linked to hardware improvements. The most exciting and innovative databases are leveraging hardware innovation to bring the next levels of price and performance. The cloud enables this innovation. Cloud companies roll forward their hardware plans without on-premises installations, and users can trial innovative hardware easily and experience the power of innovation. You’ll be running your databases on more and more specialized hardware, but you’ll never even know it.
- AI is becoming a standard technique. Between random forests, linear regression, and other search patterns, AI has become a standard technique. AI, like standard numeric techniques, is best done with compute close to data. This means the techniques of big data (separating compute and data) are a poor choice just like they were for a majority of analytics. Running AI as code on a compute grid, or within your database, does not allow the kinds of optimizations that an AI framework, or an AI-centric query system can provide. In five years, we’ll wonder why custom code lasted so long in the AI space.