Preventing Bad Data

Even the most powerful analytical tools are only as good as the data they crunch, and intelligence built on bad data can be worse than no analysis at all.

Sep 19 135 min read

Clarence Hempfield

Even the most powerful analytical tools are only as good as the data they crunch, and intelligence built on bad data can be worse than no analysis at all. Far too often, companies make large decisions or formulate strategic objectives with full knowledge – or at least a strong inkling – that their data is flawed or incomplete, simply because they can’t see past the status quo.

What can businesses do to avoid making tragic business decisions or exposing the organization to increased risks? Implementing both data quality and governance measures are the key to getting data back on track, yet the idea of analyzing and scrubbing data is well-trodden ground for business intelligence managers. Rarely do BI executives discuss actionable steps about how to prevent poor data from entering the system in the first place. While it may be an old topic, few do it well.

One of the first means of preventing bad data is to examine the three most common problem areas and how these interact when data is moved between them. These typically are:

Business applications – customer relationship management programs, enterprise resource planning, customer information systems, etc.
Movement processes – extract, transform, load
Storage – enterprise data warehouse, integration and analysis

Because each business application has different rules about how it captures, formats, identifies and sorts information, data can become flawed at several levels. As organizations begin the work of integrating multiple applications or a business application to its data warehouse, problems immediately surface if the data is not properly normalized to fit the target system. If the data within these systems isn’t routinely checked for accuracy, then an analysis might show, for example, that half of the organization’s customers are 113 years old -- all of whom were born on 1/1/1900. In addition, without enacting and enforcing uniform data entry standards, data that might appear to be valid at data entry can cause downstream damage if not corrected. It is often difficult for our colleagues to understand that data can be technically correct and incorrect at the same time – just ask any enterprise formatting dates between the U.S. and Europe or recording time across different time zones. Small details can have broad implications, and this underscores the need for organizations to create data governance committees that cooperatively set the standards for how data is used and consumed as well as definitions for quality – both of which are unique to their organization – while being mindful of the regulations impacting their industry.

Often organizations will implement data quality solutions in order to improve the data within their business applications. Unfortunately, using a point solution to fix these problems is like plugging a hole in the dike with one finger. A “fix” for a single application that’s run at the end of a month or quarter is an isolated or temporary solution and does little to prevent the flood of future errors that will occur or the many that exist in other systems. Also it is important to consider the risk that the business is exposed to between monthly or quarterly batch processes. A common mistake is treating management of data as a one-off, something that is simply an IT task, or strictly a technology problem. Few acknowledge the significant role that humans play in entering or collecting data in the first place. Data decays at alarming rates and requires regular and consistent data improvement processes to maintain and/or improve its value. These processes for managing and improving data are not just technical rules or automated commands, but also include the processes and procedures for how data is used, stored, accessed, changed, etc. True data governance is a combination of tools, business processes and, perhaps most importantly, people.

This is why business enterprises that handle large quantities of data should first consider forming empowered data governance committees within their organizations. Ideally, data governance committees should be composed of leaders from a broad swath of departments, not just confined to the IT department, because data governance requires leadership and investment from across the enterprise. A comprehensive organizational commitment helps embed the importance of data quality into company operations and culture. Data governance committees should be responsible for implementing business processes to measure and track data entry, and establishing targets and goals for improvement to which employees are held accountable.

With proactive action, ongoing maintenance and continuous improvement, data governance evolves from being a one-time fix to a front-and-center BI priority that routinely checks, corrects and augments data. Recently, I had the pleasure of speaking on a panel in which the topic was data governance, and a fellow panelist may have put it best: “Billions of dollars of business investments are being made on flawed data.” With a full-fledged data governance program, BI managers can make predictions or support critical business decisions with confidence that their insights are built upon a foundation of accurate and – dare I say – intelligent data.

Clarence Hempfield is director and principal product manager for Pitney Bowes Software, leading global product strategy for Pitney Bowes Software in the data quality and data governance domains. Hempfield has over 15 years of experience in the high tech industry with extensive experience in product management, product marketing, sales and communications. He holds BA in Political Science & Economics and a MBA, and is a certified information management professional and a certified industry analyst relations professional.