HIT Think

Record-Matching Integrity: An Algorithm Primer

Register now

When hospitals are sharing information internally or externally, accurately linking patient records from multiple disparate databases and information systems is critical to ensuring clinicians are making decisions based on the correct and accurate patient records and avoiding the creation of system-clogging duplicates and overlays.

One patient can have multiple identifiers within a single organization, e.g. medical record number, billing/patient account number, order number, requisition number, etc. More identifiers come into play when an organization has multiple locations offering different types of services. When these identifiers flow into an HIE or ACO, strong algorithms must be in place to identify with pinpoint accuracy which records belong to which patient so they can be linked into a single record with a single unique identifier for use across the initiative.

These algorithms are typically embedded in a hospital’s or information exchange’s system. However, they are not all the same. How well an algorithm performs depends upon which of the following three categories it falls into:

* Basic Algorithms:  The simplest technique for matching records, basic algorithms make comparisons based on selected data elements, typically name, birth date, Social Security number and gender. They typically utilize exact match or deterministic matching tools, the latter of which is slightly more sophisticated in that partial matches or matches from phonetic encoding systems may also be used. Basic algorithms also deploy wild-card linking techniques, which return every record that matches a limited number of characters entered into a search string as well as any other data element specified to refine the search.

* Intermediate Algorithms: Intermediate algorithms incorporate “fuzzy logic” and arbitrary or subjective scoring systems with exact match and deterministic tools. A field match weight is arbitrarily assigned to specific identification attributes and records must reach a minimum scoring threshold to qualify for consideration. Fuzzy logic utilizes nickname tables and rules to address transposed names, characters or digits and other typographical errors within the database. Intermediate algorithms may also include an automated frequency adjustment, which decreases the field match score across two records if the actual field value (i.e. a common last name or birth date) is present in a significant number of records.

* Advanced Algorithms:  The most sophisticated set of record-matching tools, advanced algorithms rely on mathematical theory--bipartite graph theory, probabilistic theory and mathematical and statistical models--to determine the likelihood of a match. Advanced algorithms also include machine learning and neural networks, which use forms of artificial intelligence that simulate human problem solving. These systems “learn” as more data is processed and automatically redefine field weights based upon that learning.

Regardless of the strength of the algorithm used, false positives and false negatives will always occur. Even the most sophisticated algorithms cannot be solely relied upon to make record-matching decisions, as “auto-linking” routines can create errors.  Among the most common errors are linking two closely related people with similar names and birth dates who live near each other or two individuals with the same name and birth date who share an address, such as can happen in large apartment complexes or other multi-family residential buildings.

This is why results must always be verified using well-established record-matching validity procedures. Skipping this critical step could result in overlaid records, potentially violating privacy laws and, more significantly, impacting care coordination, quality and safety.

Beth Haenke Just (bjust@justassociates.com) is CEO and president of Just Associates, a data integrity consulting firm.



For reprint and licensing requests for this article, click here.