Penn Health Sees Big Data as Life Saver

The University of Pennsylvania Health System, like many large health organizations, has poured enormous resources into building an enterprisewide data infrastructure. With the foundation in place, the health system known as Penn Medicine is embarking on a big data project to expand its information horizons and develop predictive analytics to diagnose deadly illnesses before they occur.


The University of Pennsylvania Health System, like many large health organizations, has poured enormous resources into building an enterprisewide data infrastructure. With the foundation in place, the health system known as Penn Medicine is embarking on a big data project to expand its information horizons and develop predictive analytics to diagnose deadly illnesses before they occur.

Penn Medicine is a leading academic medical center. Based in Philadelphia, it consists of the Raymond and Ruth Perelman School of Medicine and the University of Pennsylvania Health System. The health system includes the Hospital of the University of Pennsylvania and Penn Presbyterian Medical Center, Chester County Hospital, Lancaster General Heal, Penn Wissahickon Hospice and Pennsylvania Hospital as well as a number of inpatient and other care services.

The open-source, big data initiative, called Penn Signals, is focused on building out an enterprise warehouse and enabling the data science team to create learning models from historical, at-rest data and then position those models into a real-time data stream, says Corey Chivers, a data scientist at Penn Medicine who is one of the leads on the project.

“Our goal is to build an infrastructure that can scale up to handle a huge variety of data sources within our system that contain information about the health of our patient population,” Chivers says. “We started with the obvious candidates—our electronic health records and labs—to try to develop predictive models for severe sepsis and heat failure. But we have plans to increase the use of predictive models on the Penn Signals platform and make them available outside the organization.”

Also See: Carolinas HealthCare Using Big Data To Better Care For Patient Populations

The backbone is a homegrown enterprise data warehouse, called Penn Data Store. The warehouse contains data from clinical and administrative systems, including the three major clinical information systems at Penn Medicine: an outpatient EHR from Epic, which is used by 1,800 affiliated physicians; an inpatient EHR from Allscripts, which is used within Penn Medicine’s five hospitals; and an enterprise laboratory information system from Cerner. In all, the warehouse stores over 4 billion rows of clinical data, with 2 million being added each day.

For the big data effort, Chivers and the Penn Medicine’s data science team used an ETL procedure to pull data from the enterprise warehouse into an open source database from MongoDB that provides flexibility for the machine-leaning applications the data science team utilizes. From there, observations made by the clinical staff are converted into time-series formats, or events, that can be analyzed by machine learning applications, Chivers says.

The team uses Python programming language, ZeroMQ messaging and the iPython Notebook computational environment to pull data sets and explore that data using dimensional reduction and machine learning. They then can save predictive models they’ve developed and ship them up into the real-time data stream as operational models, Chivers explains.

For many health conditions, timing is everything. In the case of sepsis, every hour a patient goes undiagnosed increases the mortality rate by more than 7 percent, according to clinical studies, which also estimate that only 50 percent of septic shock patients receive effective therapy on time.

At Penn Medicine, the algorithm to detect when a patient was slipping into severe sepsis relied on analysis of six vital sign measurements and lab values with threshold rules. The Penn Signals predictive model takes into account more than 200 clinical variables. It has enabled Penn Medicine to detect 80 percent of severe sepsis cases within 30 hours of the typical onset of symptoms, Chivers says.

The heart failure predictive algorithm is enabling Penn Medicine to detect 20 percent more patients who are trending toward cardiac failure, and identifying a group of patients that is five times more likely to be readmitted after heart failure. Having that predictive information on hand means Penn Medicine clinicians can intervene earlier with at-risk groups and focus resources on those patients more likely to have ongoing heart issues.

Communicating the output from the predictive algorithms is done via text messages that alert specific clinical staff when a patient’s condition is heading in a dangerous direction. Penn Medicine also has developed a mobile app, called Caroline, which provides clinicians with a pared-down version of a patients’ electronic health record containing clinical data related to the alert.

The data science team also uses the online visualization site Plotly, as well as visualization tools within the iPython Notebook environment, to provide clinic department heads and clinical floor leaders with aggregate looks at the predictive data. “Our visualizations are under constant development because we want the clinicians to have a deep view of how these predictive models work and the information they utilize,” Chivers says. “We’ve worked closely with clinical staff to understand how we as data scientists can communicate with them.”

What's Next?

Penn Medicine now finds itself at a crossroad: it’s built an open-source framework that can handle the influx of health data and utilize predictive algorithms in real-time, but the volume, velocity and variety of data are ramping up quickly, Chivers says. “We’re planning to utilize new data streams—from wearable devices, telemetry devices and ICU monitors—and as we move toward that machine-generated data that’s coming in at much higher rate, we have to focus on scalability.”

Penn Signals plans to take the infrastructure to that next level through an agreement with Intel to partner in the development of the company’s Trusted Analytics Platform, or TAP.

TAP is an open-source infrastructure built on a data layer that includes Apache Hadoop, Spark and other data components, as well as an analytics layer that includes a data science tool kit to simplify model development and an extensible framework to generate predictive approaches.

Penn Medicine plans to deploy a 100-terabyte data stack via the TAP framework, Chivers says. The health system also plans to market Penn Signals to other health care organizations, he adds. “We want to get it out to other providers and find out what’s most valuable to them—is it something they’d like to deploy themselves, or would it be more useful as a platform for service? We don’t have an answer to that, but we built the platform using open-source tools so that it could be utilized beyond Penn Medicine.”

More for you

Loading data for hdm_tax_topic #better-outcomes...