Data Cleansing is a Life Saver

When it comes to data quality best practices, it’s often argued, and sometimes quite vehemently, that proactive defect prevention is far superior to reactive data cleansing. Advocates of defect prevention sometimes admit that data cleansing is a necessary evil.  However, at least in my experience, most of the time they conveniently, and ironically, cleanse (i.e., drop) the word necessary.

Therefore, I thought I would share a story about how data cleansing saves lives, which I read about in the highly recommended book Space Chronicles: Facing the Ultimate Frontier” by Neil deGrasse Tyson. “Soon after the Hubble Space Telescope was launched in April 1990, NASA engineers realized that the telescope’s primary mirror – which gathers and reflects the light from celestial objects into its cameras and spectrographs – had been ground to an incorrect shape. In other words, the two-billion dollar telescope was producing fuzzy images.  That was bad. As if to make lemonade out of lemons, though, computer algorithms came to the rescue. Investigators at the Space Telescope Science Institute in Baltimore, Maryland, developed a range of clever and innovative image-processing techniques to compensate for some of Hubble’s shortcomings.”

In other words, since it would be three years before Hubble’s faulty optics could be repaired during a 1993 space shuttle mission, data cleansing allowed astrophysicists to make good use of Hubble despite the bad data quality of its early images.

So, data cleansing algorithms saved Hubble’s fuzzy images – but how did this data cleansing actually save lives?

“Turns out,” Tyson explained, “maximizing the amount of information that could be extracted from a blurry astronomical image is technically identical to maximizing the amount of information that can be extracted from a mammogram. Soon the new techniques came into common use for detecting early signs of breast cancer.”

“But that’s only part of the story. In 1997, for Hubble’s second servicing mission, shuttle astronauts swapped in a brand-new, high-resolution digital detector—designed to the demanding specifications of astrophysicists whose careers are based on being able to see small, dim things in the cosmos. That technology is now incorporated in a minimally invasive, low-cost system for doing breast biopsies, the next stage after mammograms in the early diagnosis of cancer.”

Even though defect prevention was eventually implemented to prevent data quality issues in Hubble’s images of outer space, those interim data cleansing algorithms are still being used today to help save countless human lives here on Earth.

So, at least in this particular instance, we have to admit that data cleansing is a necessary good.

Jim Harris is an independent consultant, speaker and freelance writer. He is blogger-in-chief at Obsessive-Compulsive Data Quality, a blog offering a vendor-neutral perspective on data quality and related disciplines.

This posting appeared on Information Managment, a sister publication to Health Data Management.

For reprint and licensing requests for this article, click here.