I was recently asked to chair a symposium on the future of predictive analytics in healthcare. As someone who has been involved for decades in the business of predictive analytics, I am particularly aware of how difficult it can be to predict the future.

Nevertheless, I know without a doubt we are at the start of a technology revolution that will affect all aspects of healthcare and involve predictive analytics in hundreds, if not thousands, of healthcare applications and workflows. We tried to describe some of these applications in a recent book published by Linda Miner and several colleagues (including myself), and the result was a 1,110 page manuscript with details describing over 20 examples ranging from automating more accurate diagnoses to population risk management, fraud detection, and many others. The applications are numerous, exciting, and transformative.

But if I were to provide a 15-minute summary of where I think we are, and where we are headed, here is how I see the big picture regarding opportunities, disruptors, enabling technologies, and hurdles. 


It is common knowledge that predictive, prescriptive, and automated analytics technologies – exploiting new data technologies and sources, will be the catalysts for bringing improved effectiveness to healthcare delivery. Effectiveness here of course means better healthcare outcomes at lower cost, in a personalized patient-centered process that essentially treats each person as a valuable customer and consumer of these expensive services. Healthcare IT companies are making large investments to drive this vision, and are collaborating with physicians at academic medical centers to deliver analytics at the point of care. For example, Dr. John Crowell at the University of Iowa is using an analytic tool that incorporates real-time surgical data to predict risk of surgical site infections. The tool has helped surgeons reduce infections by an astounding 58 percent.


What is driving the revolution? Obviously, there is cost pressure, but that is not new. What is new are certain disrupting technologies that are forcing changes:

New Ways of Storing Data, and Thinking about the Value of Data

Guy Harrison, director of R&D at Dell Software, created an illustration of the history of database technology that tells an important story:


[Click To Enlarge]

In short, between roughly 1970 to mid-2000, structured relational SQL-databases enforced a paradigm where organizations would first think through exactly what data needs to be collected to achieve specific insights and functionality. But in the last few years, numerous new no-SQL database technologies have emerged, supported through falling hardware and (cloud/on premise) storage prices, enabling organizations to store "everything" without imposing a structure, just in case some information in those data may become relevant for future projects. This is an entirely different way of thinking about data, and why they are collected and stored.

New Data Sources, and the 360° View of Patients

At the same time, and enabled through new technologies, entirely new data sources have become available. Here is a quick view of the number of data streaming through the internet every minute of every day, as of 2013.


[Click To Enlarge]

These data can be leveraged to develop 360° views of patients, and patients-as-customers. For example, many if not most of the factors that contribute to good health or recovery from a health emergency are associated with patients' life style, demographic characteristics, and so on. Understanding those factors enables more personalized health care and interventions, and better health outcomes. For example, population health management analyses now routinely consider those variables when assessing risk.

Internet of Things (IoT) and Wearable Devices, and the Voice-of-Customer

Streaming (through the Internet) social data are not the only type of new data. The general value added to the global economy in 2019 through IoT (including hardware, software, services, etc.) was recently estimated at $1.7 Trillion. New wearable devices and applications are announced almost daily, and many have relevance for facilitating better health care delivery. In short, key parameters describing physiology, life style, compliance with treatment regiments can now be monitored remotely, allowing not only better identification and prediction of impending problems or emergencies, but also giving patients a feedback-channel to their medical teams, communicating back in real time what is really happening to them. Therefore, these technologies and data not only enable better health care from the provider's point of view, but also more personalized care and voice-of-customer from the patients' point of view.

Machine Learning and Pattern Recognition

Finally, to make sense of all these new data, new machine learning (pattern recognition) algorithms have greatly expanded the power of classical statistical approaches. The focus of machine learning is accurate prediction based on repeated patterns in the (big) data, rather than the fitting of statistical models or hypothesis testing. (Among the first to point out the importance of this change in perspective was Leon Breiman in the classic paper on "Statistical Modeling: The Two Cultures,” published in Statistical Science in 2001.) Distributed computational architectures have super-charged these algorithms which are capable of sifting through huge amounts of data in very short time. These pattern recognition methods are very robust, much less susceptible to outliers, can handle unstructured and bad data, and are generally actually easier to use. Machine learning algorithms allow extraction of actionable information from all data, and methods are also available to make such models transparent to support root cause analyses for prescriptive systems.


While the opportunities are clearly exciting, as is always the case, the real world imposes constraints and places obstacles before rapid progress.

Data Challenges and Cost

First, there is the common problem of data acquisition, integration, and curation. Those challenges are huge in health care, because of the long tradition of proprietary clinical applications and electronic health record (EHR) systems, resulting in silo'ed data and sources recording similar parameters in different ways. These challenges are often not easy to solve, and require expensive and often labor-intensive solutions. The good news is that these problems are being addressed with new solutions that reduce the time, labor and expense of integration.

Resource and Knowledge Constraints

A much bigger problem are resource constraints around the skills required to derive actionable information from big and new data, and to deliver it to the endpoints where such information will generate the greatest value. There is an existing and quickly compounding shortage of "Big Data Scientists" with relevant healthcare domain knowledge.

Shown here is a recent picture of the trend for job openings related to "Data Science" and "Big Data" and "Data Scientist".


[Click To Enlarge]

Along with the exponential growth of big data, there is an exponential growth of job openings for big-data-scientists. Obviously, this trend is not sustainable.

Automated analytics. The only way to resolve this resource problem is to move towards greater automation. In short, the modeling process must be automated, yet controlled by the business user to achieve the specific desired outcomes and ROI. With respect to analytics maturity of an organization, this is the move from prescriptive analytics (which follows predictive analytics) to automated analytics, as recently advocated by Tom Davenport. In practice, this can be accomplished through the integration of machine learning algorithms and methods into specific automated workflows to enhance medical applications. For example, such a system was recently announced by AnesthesiaOS, in collaboration with Dell (and Dell Software's Statistica) as well as Microsoft.

Governance and Government Oversight

This is a topic that has not been discussed much, but in my opinion looms large and will have a significant impact on the manner in which predictive analytics will be adopted for applications in the healthcare domain. When the results of predictive models "matter", greater scrutiny is inevitable. While it may be harmless when a marketing model mal-functions, and I receive a catalogue with irrelevant offers, similar wrong predictions can have significant consequences in practically all applications in healthcare. Sometimes, those consequences may just be large financial losses (e.g., if fraud is not detected early enough), but in other cases, personal lives can be affected.

Harvesting big data carries with it the responsibility to do-the-right-thing with those data. Big or any data and predictive models in healthcare must be correct, access and tamper-proof (secure), must not discriminate, generally do-good, and not-do-any-harm. To ensure that these criteria are met, there will be regulatory oversight around validation (similar to the regulations around validated deployments of analytics in pharmaceutical and medical device manufacturing) and privacy. For example, I have recently participated in briefings by Peter Swire, a recognized expert in privacy law and former "privacy czar" in the Clinton administration, who has written widely on the subject of big data, privacy, and fairness.It is clear that the policy implications of Big Data privacy and fairness are being actively discussed today, and will lead to regulatory guidance and frameworks very soon.

Regulatory oversight typically means a more conservative approach to innovation, thinking through all consequences, as well as intended and possibly unintended results. As a practical matter, and without doubt, all predictive analytics projects in the healthcare space must be entirely transparent, open to scrutiny, provide metrics around the quality of data and models, and be aware of the consequences and impact on all patients.


Predictive analytics will revolutionize healthcare delivery in this country and worldwide, of that I am certain. At the same time, it is clear that there are specific hurdles and considerations when it comes to the healthcare domain which will make the implementations different from what we are used to in the consumer and e-retailing markets and applications.

Nevertheless, the future is quickly approaching, and as in many ways already arrived – and the future looks bright.

Thomas Hill, Ph.D., is Executive Director Analytics at Dell


Register or login for access to this item and much more

All Health Data Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access