Sequoia Project Matching Exercise Underscores Patient ID Challenges

Commentary: The Sequoia Project, a not-for-profit organization that creates solutions for health data exchange, recently published a report featuring a case study of a patient matching pilot between large, sophisticated regional health organizations. It highlights the challenges of patient matching.


Commentary: The Sequoia Project, a not-for-profit organization that creates solutions for health data exchange, recently published “A Framework for Cross-Organizational Patient Identity Management.” The report features a case study of a patient matching pilot between large, sophisticated regional health organizations, including Intermountain Healthcare, a not-for-profit based in Salt Lake City that includes 22 hospitals and over 185 clinics.

Both the case study and Intermountain Healthcare’s willingness to share its experience are immensely valuable to any organization beginning a large-scale patient matching endeavor.

Initially, Intermountain Healthcare was only able to achieve a 10 percent match rate of patient records across organizations; this was attributed to its data being “fraught with data quality issues.” To elevate its match rate to 95 percent, Intermountain Healthcare had to gather missing data, correct inaccurate data and institute data entry and encoding controls across organizations, all of which required teams of data stewards.

It then had to perform complex statistical analyses of its data and methodically refine its matching algorithms based on these analyses. This, too, required significant time, labor and data work. To maintain its high match rate, Intermountain Healthcare will need to continuously and perpetually update and clean its data, which will require more teams of dedicated data stewards.

Intermountain Healthcare is far from being unique in having data quality issues—they are the norm, not the exception. And these extensive, laborious and costly data quality exercises are frequently necessary to achieve higher patient match rates. However, organizations beginning large-scale cross-organizational matching endeavors can use a new paradigm in matching, called referential matching, which has the potential to achieve matching success rates of as much as 95 percent, despite poor data quality and differing data governance standards. Referential matching can eliminate the need for some data quality activities, for rigorous statistical analyses and for the ongoing need to keep data up-to-date and correct.

Up until now, the state-of-the-art approach to patient matching has been to use pair-wise matching algorithms that compare records directly to each other. These algorithms use common attributes about patients’ identities—like name, gender, birth date, and address—to determine if two records match. They heavily rely on identity data being correct and up-to-date.

However, 30 to 40 percent of identity data in any given database is out-of-date, incorrect or incomplete, and attempting to clean identity data is usually a losing battle—data quality often deteriorates faster than data stewards can keep up. This is why Intermountain Healthcare achieved such low initial matching success rates—its pair-wise matching algorithms could not overcome the low quality identity data they were using to match patient records.

Referential matching, on the other hand, uses a new and unique approach to matching patient records. Rather than comparing two records directly to each other, referential matching engines compare each record with an identity in a reference database. If both records match to the same identity in the reference database, then they match to each other.

The reference database used by a referential matching engine contains billions ofcommercially available data elements for millions of identities. The data is sourced from third-party data vendors, and the database is updated frequently (typically weekly or monthly). Each data element is constantly tested and rated for validity, source authenticity and consistency to determine what data is the most accurate and up-to-date. Sophisticated algorithms and linguistic rules see through perpetuated errors, cultural naming differences and familial relationships, further helping to ensure that the right data is associated with the right identity. Importantly, the data in the reference database spans decades and includes old, erroneous and incorrect data as well as new, clean and accurate data.

Because of the comprehensive nature of the reference database, a referential matching engine can make matches that traditional matching tools could not make without prior data quality adjustments. For example, consider a patient who has recently been married. She decided to change her last name, and she and her spouse moved into a new house together. One record about her contains her old address, her old last name, a nickname instead of her first name, and a missing birth date. A second record has her new address, her new last name, her full first name, and a transposed month and day in her birth date.

Traditional matching engines would not match the two records unless data stewards first cleansed the data, and then performed lengthy analyses on which attributes should be used to make the match. But a referential matching engine will match both records to the same identity in the reference database, because the database contains a longitudinal record of all known identity attributes for every person, including incorrect and out-of-date attributes.

To further illustrate the capabilities of referential matching, consider that the case study highlighted “street address” as being a less-than-ideal attribute with which to match two records together, even if used in conjunction with other attributes like name, gender and birth date. This is because street addresses offer poor stability (consistency through time), poor comparability (consistency of structure and format), and only moderate distinctiveness (ability to uniquely identify someone).

On the other hand, referential matching engines keep track of addresses through time; they have built-in rules that can see through format and structural inconsistencies; and they can use any data point, including address, to confirm or disconfirm the uniqueness of a patient.

Thus, referential matching engines offer new possibilities for matching technology, and The Sequoia Project case study emphasizes how they can be invaluable in cross-organizational patient matching endeavors.

Mark LaRow is CEO of Verato, a McLean, Va.-based provider of identity solutions and services.

More for you

Loading data for hdm_tax_topic #better-outcomes...