Data quality issues bog down use of analytics

Resolving problems wastes time, delays implementation of findings, survey shows.


Key data performance management issues challenge the IT executives in organizations of all sizes, and resolving problems wastes time and delays the use of data, many in the industry believe.

Prime challenges for organizations range from stopping bad data to keeping data flows operating effectively, according to a new survey by Dimensional Research.



The vast majority (87 percent) of the 300 data management professionals surveyed report that they’ve added bad data into their data stores, while just 12 percent consider themselves good at the key aspects of data flow performance management.

The survey, sponsored by StreamSets, shows pervasive data pollution, which implies that analytic results might be wrong—leading to false insights that drive poor decisions.

The impact of substandard quality data in healthcare is one of the key factors concerning the reliability of results from analytics, and whether clinicians will trust findings enough to make treatment decisions on evidence that might be suspect.

Even if organizations can detect bad data, the process of cleaning it after the fact wastes the time of data scientists and delays its use, the study notes.

Ensuring data quality is the most common challenge in managing big data flows, say 68 percent of respondents. About three quarters of the organizations said they currently have bad data in their stores, despite cleansing data throughout the data lifecycle.

While 69 percent of organizations consider the ability to detect diverging data values in flow as “valuable” or “very valuable,” only 34 percent rated themselves as “good” or “excellent” at detecting those changes.

And while detecting bad data is a critical aspect of data flow performance, the survey showed that enterprise struggles are much broader. Only 12 percent of respondents rate themselves as “good” or “excellent” at detecting a down pipeline, throughput degradation, error rate increases, data value divergence and personally identifiable information (PII) violations.

More for you

Loading data for hdm_tax_topic #better-outcomes...