To get at meaningful data, purge the superfluous

With data collection easier than ever, knowing what to purge becomes much more important.


Early in my IT career, I became an expert in discarding data. Now, with data collection easier than ever, I wonder if it’s time to reconsider those early lessons.

My first healthcare IT job was installing the SMS Command database applications that included report writing, medical records, infection control and utilization review. The database structure parsed into Active and Historical data files. Data elements that did not move from the online system to the Active file were purged, according to user defined parameters. Likewise, user-defined parameters determined Active and Historical file data retention, based on pre-defined purge schedule.

Right after leaving SMS to move out-of-state with a former spouse, I landed a leadership positon in an HBOC client site. Soon after I started my new position, the clinical systems upgraded from Clinpro to Clinstar, a totally new application on a new platform.

My staff and I reviewed the system administration documentation and user-defined system profiles to understand and support the new system. I quickly discovered that the new clinical system offered no user-defined purge criteria. When I spoke to the system development lead at corporate headquarters about my concerns about the lack of purge criteria, he was surprised that this was an issue and assured me it was not a design oversight, and he wondered why I was concerned about purge criteria in the first place. Over time, clinical systems capture more data, and as hardware and storage got cheaper, smaller and faster, there was less reason to be concerned about purge criteria.

Today, as we determine what is meaningful use, we also are wrapping our heads around what is meaningful data. The definition will vary by organization. For example, a community hospital historically did not have the same data requirements as an academic research center that is publishing new clinical protocols.

Perhaps today, we need to step back and ask ourselves what is meaningful data—much as we did with those older legacy systems—to determine what we keep, as well as what and when we discard it. We may begin with determining the questions we want to answer. What trends should we be looking to spot?

After we have a clearer understanding of what we need from data, we may be better able to determine what is meaningful. For instance, does the retention of every vital sign throughout a hospital stay provide value? On the other hand, perhaps every adverse event or unexpected outcome may provide longitudinal value.

I think of data retention much like addressing other types of accumulation. For example, rather than build a bigger room to store things, why not get rid of things? In other words, don’t invest in more storage—instead, empty the trash. Purge non-meaningful and non-actionable data and keep only what you need.

All purge decisions require interdisciplinary engagement and collaboration—call it data governance if you like. Moreover, while you are at it, determine the longitudinal “source of truth” system that will serve as the organization’s core storage reservoir, and purge data frequently in contributing source systems. Create a core reservoir, or if you prefer, a clinical decision database, populated by many other sources, but where data is uniformly defined and managed. For example, the organization would have only one length of stay value, consistently calculated in the same manner.

To create and manage such a reservoir requires ongoing administrative, clinical, vendor and IT leadership engagement. This upfront and ongoing collaboration ensures accessible, trustworthy and meaningful, actionable data.

More for you

Loading data for hdm_tax_topic #better-outcomes...