More than one recent big data study has labeled 2016 as the year of action, a time to actually act on the insights that data analytics can provide. That theme was echoed at the recent Strata & Hadoop World conference in San Jose, Calif.
David Weldon, editor of Information Management (a sister publication to Health Data Management) spoke with Ash Parikh, vice president of data integration, data security and big data at Informatica for his take on what this new focus means.
What are the most common themes that you are hearing and how do those themes align with what you expected?
The Big Data space has been evolving gradually over the last few years, reflecting the level of maturity in real customer projects. A few years ago, if you were to attend any of the major Big Data events, road shows or conferences, you would walk away extremely excited about the buzz, the giveaways, the myriad technologies mushrooming by the minute, and the newness of the space in general.
We are starting to see a shift—there is more awareness in general that it is a nightmare to keep up with all the new technologies being introduced and ones that are fast becoming outdated in such a short time.
Additionally, there is more discussion about how to deliver value from all the investments around Big Data—how can I increase campaign effectiveness, how can I ensure improved healthcare outcomes or how can I reduce the risk of fraud? The fact that there are now more sessions and articles and blogs and Tweets about how to not turn a data lake into a data swamp, is evidence in itself that companies are starting to ask some hard questions.
What are the most common challenges that organizations are facing in data management and data analytics?
Firstly, the audience is not even fully aware that there is a problem. According to Gartner and other leading industry analyst firms, over 70 percent of Big Data projects either fail entirely or struggle to go beyond experimentation because of a lack of due diligence upfront to data management.
It is generally felt that it is enough to simply spin up a Hadoop cluster, dump all types of data into it at scale, create a sandbox, and experiment, and then almost magically, those golden needles in the haystack (read that as new and unique insights) will reveal themselves. Typically, all this is done by bringing together a host of open-source technologies and throwing hand-coding at the problem.
If this effort needs to scale, and more importantly, deliver trusted and timely insights, customers typically translate this to more hand coding. Simply throwing bodies at the problem is not scalable and won't solve the complex data management issues with Big Data. There are serious data integration, data governance and data security issues that need to be handled at scale.
What are the most surprising things that you are hearing?
Some of the most surprising things we have observed from our interactions with customers is that they don’t even know that they have a problem until it is too late. And when they do, they go into reactive mode and try to address complex data management issues with Big Data with hand coding.
The other issue that has sprung up over the last year or so is that there is a growing belief that a stand-alone data preparation tool is enough to handle these issues. Let’s put this down to hype as well. A few years ago, we saw something similar happen in the business intelligence space, where the new age self-service business intelligence tools promised to deliver their users from all kinds of data management challenges. But very quickly, people realized that these tools needed a solid data management platform underneath to truly deliver those insights to those beautiful dashboards.
It’s the same case with stand-alone data preparation tools, which can go only so far with addressing the serious data integration, data governance and data security issues that need to be handled at scale in big data projects.
What does Informatica view as the top data issues or challenges in 2016?
The world of data management and analytics is not new – but it is definitely not stagnant. There is constant change with regard to new technologies and new approaches being discovered every day, for delivering more business value and better business insights. This is indeed exciting, and fuels innovation. However, what a customer needs to guard against in such a dynamic environment is hype.
Today, more than ever, there is a need for level-headedness and pragmatic thinking. There is a need to step back, breathe and smile when an article or blog or webinar or Tweet boldly announces that “data warehouses are dead” or “Hadoop is all that you need” or “data preparation will solve all your big data management issues,” as nothing can be further from the truth.
Ask yourselves the following questions: Do you need to know what you sold the customer in the past? And do you also need to know what your customer might want to buy in the future?
If the answer to both those questions is resounding yes, then you will need a data warehouse as well as a data lake to get a holistic answer, and get your hands on that golden needle in the haystack. And if that needle is indeed golden, then it is a safe bet that you will need to be proactive and upfront in doing all the data management due diligence around your all you data – big or small.
Register or login for access to this item and much more
All Health Data Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access