The Big Data Revolution: Part 2

Register now

Last week’s blog introduced a new book, “Big Data: A Revolution That Will Transform How Live, Work and Think,” that I like a lot. Starting from the current obsession with datafication – “taking information about all things under the sun…and transforming it into a data format to make it quantified” – Big Data identifies three major developments, incredibly large data sets, acceptance of messy data, and a tolerance for correlation in lieu of causation, as drivers of the revolution. From Big Data’s perspective, the business and social implications of these shifts are substantial.

The value of data is evolving. Historically seen as ancillary, “in the age of big data, all data will be regarded as valuable, in and of itself.” Indeed, “data’s value needs to be considered in terms of all possible ways it can be deployed in the future, not simply how it is used in the present …Ultimately, the value of data is what one can gain from all the possible ways it can be deployed.” The “option value” of data is the sum of those possibilities.

Big Data identifies three methods for releasing data’s option value. The first is basic reuse, illustrated by web-traffic measurement company Hitwise, a subsidiary of Experian. Hitwise provides an offering that lets marketers search its traffic to learn about consumer preferences. Years ago, AOL missed a great reuse opportunity, ceding to Amazon the business of running its e-commerce site – and along with it access to data on what users were reviewing and buying. AOL apparently didn’t see the secondary value of its operational data.

A second strategy for capturing data value is “recombinant” – combining multiple (and disparate) datasets. The Danish Cancer Society meshed cellphone transaction data with socioeconomic and cancer registry files to test the influence of cellphone use on cancer prevalence in Denmark. And Zillow links real estate transaction price information with property specification, neighborhood and map data to fuel its property value predictions.

Finally, companies often find value in data exhaust. Google developed the world’s most comprehensive spell checker as an artifact of its work proposing corrections to errors in search engine queries.

The authors articulate three types of big data business in the market today. The first revolves on companies that own or have access to the valuable data. The second are firms such as consultancies, vendors and analytics providers that have big data skillsets and expertise, even if they don’t own the data or divine innovative big data business models. The third are firms with the big data mind-set to unlock new forms of data value. Big Data sees the utility shifting from companies with the mind-set and expertise to those who control the data.  Data science skills will become increasingly commonplace: “Data is the critical ingredient.”

There are, unfortunately, big potential risks that accompany the benefits of big data. Individual privacy is increasingly problematic in a big data world, especially as data take on secondary uses unimagined at first. As a consequence, current solutions such as consent, anonymization and “opt out” are increasingly ineffective. An even more ominous development is that with enhanced predictive power comes the specter foreshadowed in the film “Minority Report,” where citizens are imprisoned not for what they did, but for what they are foreseen to do. “But it’s a perilous path to take. If through big data we predict who may commit a future crime, we may not be content with simply preventing the crime from happening; we are likely to want to punish the probably perpetrator as well.”

The authors propose that to protect from the dictatorship of big data, those who profit from it must become accountable for their behavior. “In such a world, firms will formally assess a particular reuse of data based on the impact it has on individuals whose personal information is being processed.” New professionals akin to today’s auditors – algorithmists – with expertise in data science will emerge to review big data analysis and predictions. Businesses will employ internal algorithmists to monitor their data operations, supplementing that work with support from external algorithmist firms.

Big Data affirms my observation a few months back that data ubiquity is bringing change to the research methods of social sciences like economics, political science and sociology, the “sampling studies and questionnaires” of the past giving way to N=all analysis now. In fact, an advanced degree in quantitative social science might be an ideal academic background for budding data scientists, especially those wishing to work for big data in government archetype and NYC director of analytics Mike Flowers. Just be sure to tout programming competence in R/SAS rather than Stata/SPSS!

In the end, the authors are quite bullish on the future of big data. They caution, however, that there’s a key and uniquely human role to governing analytics that involves “intuition, common sense, and serendipity…the ‘what is not’: the empty space, the cracks in the sidewalk, the unspoken and not-yet-thought….It also suggests that we must use this tool with a generous degree of humility … and humanity.” Humans must indeed have the final say.

This blog originated at Information-Management, a sister publication of Health Data Management.

For reprint and licensing requests for this article, click here.