HIT Think

Why Clinical Document Architecture doesn’t solve data quality issues

Clinical Document Architecture, for all its promises of better-organized and better-quality data, is not perfect. Its weakness is due, in part, to what it is able to do—exchange data.

To be sure, it is no secret that healthcare data has a quality issue, whether it’s technical or otherwise. In fact, from a developer’s perspective, health data is really at the mercy of those who treat patients and enter data into their records.

VistA photo.jpg

However, it is possible to give doctors feedback on the data that they are entering to improve its usefulness. Why do they need such feedback? A big reason is the loose definition of Clinical Document Architecture (CDA), which is able to place results in many fields in the typical Electronic Medical Records (EMR).

The CDA is a standard that provides the ability to place data in records, enabling clinician notes to be transferred as free text entry. For better or worse, that leaves developers at the whim of the data that a human has manually entered into an EMR program. It is difficult to tell a system what to do when the word “wheelchair” is entered in a field for the height of a patient, or the word “childhood” is used for recording the date of a procedure in a surgical history questionnaire.

EMR vendors do not take the time to educate doctors and other medical staff properly on how to enter data, especially after systems are running and operational. EMR systems typically do not run any analytics on information entered—usually, they just do edit checks to ensure the data is valid. However, the systems do not ensure that entered data is actually accurate and useful.

To extend the previous example, an entry such as “wheelchair” is technically valid for height, weight, blood pressure, allergy, favorite soft drink and smoking habits, among other things. This scope of variation is bad for data analytics efforts, because it mixes invalid data with valid data.

An analytics mindset needs to have a desire to drive down healthcare costs by allowing better risk calculations while, at the same time, helping doctors and other medical staff provide better healthcare by identifying which patients need treatment, and how that treatment needs to be organized.

Physicians and other medical staff are increasingly receptive to feedback offered about how to improve data entry. In many cases, they just need the proper education. This means that with the proper analytics approach, the benefits of better data entry can provide tangible and actionable benefits for providers, patients and payers.

And this is where software architecture can help out. CDA is an XML-based structure designed to contain any number of Continuity Care Documents (CCD). The documents are used for tracking patient data. However, while they both use XML as their document structure standard, the use of that XML for these purposes has not been stringently defined.

Here is are examples of how two EHRs handle the "effective time" field:

<effectiveTime value="2015-06-03">


Those examples reflect the same patient data generated by two different EMR systems, and it is clear that the data is just different enough to be a programmatic nightmare. While the document structure is common between source systems, the location of values is not. In the example, the effective time is located somewhere in the 'effectiveTime' XML tag, but that is all the standard guarantees. To resolve this requires customization for each EMR, and potentially for each practice.

Furthermore, there is currently nothing in the third-party software market that could truly and properly parse a CCD for the information needed to resolve this kind of issue. Although plenty of products can create and send CCDs, almost nothing can parse them out into a model with which a data analyst can work.

One way to circumvent this is by parsing CCDs into a tabular format so that they can be used in an analytics program. Building a custom CCD parser that creates a configuration file that tells the parser exactly where to get its information from is a good place to start.

To solve the problems posed by the example above, the parser created two configuration files. When parsing the data from EMR1, the parser pulls in that configuration and knows that to get the effective date; it needs to utilize the value attribute of the main 'effectiveDate' tag. For EMR2, the parser gets the effective date from the literal value of the child 'low' tag of the 'effectiveDate' tag.

The key is the ability to rely on the fact that even though each EMR does it differently, a single EMR still does it consistently. That means for EMR1, one would always expect to find the effective date in that value attribute. The parser also accounts for different EMR versions, so if a specific EMR has multiple releases, a configuration can be created for each specific version. The parser can also allow for customization per medical practice, so if different practices use the same EMR version, but one adds more data than another, configurations can be simply added to pull that additional data.

The payoff from this approach is the ability to collapse many different CCDs into a single set of tables upon which analytics can be performed. The parsing solution that was built was designed to be scalable. Clinics receive CCDs in real time, as patient encounter records are created. They are processed in a batch format and parsed in bulk every night by queuing them up and pushing them through the parser before being released to the data mart. However, the code could just as easily be repurposed to queue up records run in real time as loads increase.

This is but one, albeit an elegantly effective one, approach for dealing with some of the inherent problems of a Clinical Data Architecture running headlong into real-world data entry in the medical world. It is an approach that begins to resolve the discord between the theoretical and the practical, and as a developer, it does not get much better than that.

For reprint and licensing requests for this article, click here.