Demystifying Data Quality: Why Process Matters More Than You Think

Demystifying Data Quality: Why Process Matters More Than You Think

In the day-to-day use of data in an organization, we often hear about “data quality” as the gold standard. But what exactly does it mean for data to be of high quality? Let’s break it down.

 

 

 

 

 

 

1. Data as the Artifact of a Process

Imagine a dataset as a treasure chest, but instead of gold coins, it holds bits of data. Every piece of data, whether it’s sales figures or customer demographics, is like an artifact unearthed from a complex archaeological site—the process. Just like artifacts reflect the culture and practices of ancient civilizations, data reflects the processes that produced it. Interestingly, the process will reflect the culture and practices in place at the time as well just like the archeological artifacts.

2. Blame the Process, Not the Data

Ever heard the phrase, “Don’t shoot the messenger”? In our case, don’t blame the data—it’s not wrong, but rather, it’s the process of collecting it that goes astray. For example, if a sales report shows unexpected figures, it’s likely due to flaws in how sales data was collected, not because the data itself is faulty.

In one case, a noted physician complained that the hemoglobin data was wrong when it came back from the lab. He contended such results should be removed to avoid confusion. My contention was that the process had a problem; if the data were removed, it would not be measured. I posited the question, “If a patient had to have blood drawn an extra time, would you expect a higher or lower patient satisfaction score?” A wry smile came from my colleague, along with his agreement.

3. The Quest for Conforming Data

Data quality isn’t just about having a bunch of numbers—it’s about how well those numbers match our expectations. Think of it like baking a cake: you expect a fluffy cake when you follow the recipe exactly. Similarly, data quality should measure how closely the data conforms to our expectations and standards.

With this conformity issue in mind, actual measurements of quality may surprise the hardened skeptic. In a university setting, the organization had 1.3 million student records spanning 20 years. The common thought was the student data was a mess, filled with duplication and inaccuracies. After doing an analysis, we determined the accuracy level to be 99.7%, with only about 4,000 student records not conforming. This finding led to excitement, and a movement was quickly afoot to resolve the 4,000 identified student records.

4. Editing What You Know, Auditing What You Don’t

Here’s a practical approach: “Edit what you know, audit what you don’t.” It’s like proofreading a paper—you correct the parts you’re familiar with (like fixing typos in your own writing) and audit the unfamiliar (checking sources and references). In data terms, this means ensuring accurate entries (like converting kilograms to pounds correctly) and auditing anomalies to spot errors or deviations from the norm.

If you have ever seen the content of the “state code” field in many databases, you can understand the concept of “edit what you know.” In one case, a state code of “GE” was found in the state field. We found this entry meant “Georgia” because the field was being truncated and the individual was typing “Georgia” in the field rather than “GA.” Because the work was separated by state, Georgia was the only state the individual was inputting. Others in the group were using the correct abbreviation. Interestingly, the associate was set to retire in a few weeks. We would have been left with the question, “Did she think the abbreviation for Georgia was GE?”

5. Data Ownership: Who’s in Charge Here?

Just like every artifact has its guardian, every piece of data should have its owner. Those who create and modify the data should also be responsible for its quality. Imagine if the archaeologist who discovers a rare artifact also ensures its preservation and accurate documentation—that’s the essence of data ownership. Improving the data will improve the process and make the process owner’s job easier. They have a vested interest and solid WIIFM (“what’s in it for me”). In short, “He who owns the process, owns the data.”

Putting It All Together

So, as you discuss data usage, remember: data quality isn’t a mystical concept—it’s about understanding the process generating the numbers. By focusing on improving processes and taking ownership of data quality, you’ll uncover valuable insights without getting lost in a sea of numbers.

And remember, even high-quality data has its quirks and surprises—just like uncovering buried treasures. Embrace the journey, and who knows? You might just find the insights that change the game.

Until next time!

Dr. Dave

This entry was posted in Data into Dollars, Leadership & Management, Philosophy. Bookmark the permalink.

Leave a Reply