7 trends that will drive data management and cloud computing in 2020

Organizations will focus on investing in the public cloud while maintaining their on-prem Hadoop, one ML framework to rule them all and ‘Kubernetifying’ the analytics stack.

Dec 16 195 min read

Haoyuan Li

Chief technology officer, Alluxio

In 2019, we saw the beginnings of the new data organization. Teams dedicated to supporting growing AI and analytic workloads grew and became more common, and as a result, the technologies to power those workloads have become more critical.

In the New Year, this new data organization will be brought to the forefront. It will focus on investing in the public cloud while maintaining their on-prem Hadoop, one ML framework to rule them all, and “Kubernetifying” the analytics stack to name just a few.

Rise of the hybrid cloud (really)
We’ve been hearing people talk about the hybrid cloud for the past three years now. And for the most part, that’s all it’s been—talk. But 2020 is the year it will get real. We are seeing large enterprises refusing to add capacity on-premises to their Hadoop deployments and instead investing in the public cloud. But they are still not willing to move their core enterprise data to the cloud. Data will stay on-premise, and compute will be burst to the cloud, particularly for peak demands and unpredictable workloads. Technologies that provide optimal approaches to achieve this will drive the rise of the hybrid cloud.

One Machine Learning framework to rule them all
Machine learning with models has reached a turning point, with organizations of all sizes and at all stages moving towards operationalizing their model training efforts. While there are several popular frameworks for model training, a leading technology hasn’t yet emerged. Just like Apache Spark is considered a leader for data transformation jobs and Presto is emerging as the leading tech for interactive querying, 2020 will be the year we’ll see a frontrunner dominate the broader model training space with pyTorch or Tensorflow as leading contenders.

“Kubernetifying” the analytics stack
While containers and Kubernetes works exceptionally well for stateless applications like web servers and self-contained databases, we haven’t seen a lot of container usage when it comes to advanced analytics and AI. In 2020, we’ll see a shift to AI and analytic workloads becoming more mainstream. “Kubernetifying” the analytics stack will mean solving for data sharing and elasticity by moving data from remote data silos into K8s clusters for tighter data locality.

Hadoop storage is dead; Hadoop compute (Spark) lives strong
There is a lot of talk about Hadoop being dead...but the Hadoop ecosystem has rising stars. Compute frameworks like Spark and Presto extract more value from data and have been adopted into the broader compute ecosystem. Hadoop storage (HDFS) is dead because of its complexity and cost and because compute fundamentally cannot scale elastically if it stays tied to HDFS.

For real-time insights, users need immediate and elastic compute capacity that’s available in the cloud. Data in HDFS will move to the most optimal and cost efficient system, whether that's cloud storage or on-prem object storage. HDFS will gradually fade, but Hadoop compute will live on and live strong.

AI and analytics teams will merge into one
Yesterday’s Hadoop platform teams are today’s AI/analytics teams. Over time, a variety of ways to get insights on data have emerged. AI is the next step to structured data analytics. What used to be statistical models has converged with computer science to become AI and ML.

Data, analytics and AI teams need to collaborate to derive value from the same data they all use. And this will be done by building the right data stack—storage silos and computes, deployed on-prem, in the cloud or in both, will be the norm. In 2020, we’ll see more organizations building dedicated teams around this data stack.

The talent gap will inhibit data technology adoption
Building the stacks that enable data technology into practice is difficult, and this will only become more obvious in 2020. As companies discuss the importance of data in their organizations, they’ll need to hire the data, AI and cloud engineers to architect it. However, there aren’t enough engineers who have expertise in these technologies to do that. This “super-power” skill is the ability to understand data, structured and unstructured, and pick the right approach to analyze it. Until the knowledge gap closes, we’ll continue to see a shortage of these types of engineers—many organizations will come up short on their promises of "data everywhere."

China is moving to the cloud on a scale much larger than the U.S.
In the past five years, while enterprises in the U.S. have been moving in leaps and bounds to public clouds, enterprises in China have been investing mostly in on-prem infrastructure, primarily for data-driven platform infrastructure. 2020 will be the inflection point where this changes. China will leapfrog into the cloud at a scale much larger than the U.S. by adopting the public cloud for new use cases, bursting in the cloud for peak loads and over time move existing workloads. Public cloud leaders in China will see dramatic growth that might outpace the growth of the current cloud giants.

More for you

Loading data for hdm_tax_topic #better-outcomes...