We all know that data is growing rapidly in size and complexity. So is the demand for faster and deeper insights. As we try to tame the growing data chaos while speeding up integration, processing, analysis, and access, data quality suffers. Let’s explore the ways we can have our cake and eat it too.
Data quality is contextual
Data quality means implementing the right processes and tools to achieve insights quickly while ensuring information is trustworthy and usable. How you define quality depends on how the data will be used. Thomas C. Redman mentions in Data Driven that high-quality data is “fit for [its] intended uses in operations, decision making and planning.”
- What are the intended purposes for the data?
- What process do you have to identify and document those purposes and use cases?
- How do you ensure flexibility, as use cases shift and change depending on the needs of the business?
Put data quality under your data governance umbrella
Agile data governance provides the principles, policies, business workflows, and technology solutions to ensure your data work accelerates access and insights while also ensuring safety. Data quality is a key pillar for governance because unless you can trust the data and quickly and easily put it to work, you cannot achieve democratization, self-service, risk mitigation, and compliance.
Build a company culture that embraces data quality
Your culture must emphasize high-quality data and take responsibility for it. Otherwise, it’s hard to make progress against your data governance and broader data strategy initiatives because they will be bogged down in finger-pointing and hand waving.
- Do your leaders emphasize the importance of data quality and governance?
- Do you encourage both business and technical folks to take responsibility and accountability for data systems?
- Do you treat data as a top-level asset and priority?
- Do you have executive-level representation and accountability, such as a Chief Data Officer or Chief Analytics/AI Officer?
- Do you have a data literacy initiative where you are educating employees on the importance of data, data management best practices, and basic data and statistics concepts?
Now it’s time to size up the quality of your data
With your data catalog use cases, governance framework, and company culture embracing data quality, it’s time to measure impact and effectiveness. The “big 6” categories used to measure data quality include:
- Completeness – do you have all the data, and is it available for use?
- Uniqueness – how unique is the data, and could it be mistaken with other data?
- Consistency – is the data changing more/less than it should be?
- Validity – is the data in the right formats, types, etc.?
- Accuracy – is the data correct?
- Timeliness – does the data represent the timeframe needed and refreshed often enough?
One of the biggest mistakes we see over and over again is trying to apply one-size-fits-all quantitative metrics. Companies will say “okay, the percent of nulls will be completeness… we’ll track percent change in values and anything over 5% change is a red flag… we’ll run data type validations to ensure strings are strings and integers are integers.” While these feel right in spirit for a Chief Data Officer, it very quickly leads to “boiling the ocean” and inevitably results in a high noise-to-signal ratio where alerts are constantly going off that don’t matter. And true problems are missed.
Start with an agile data governance approach
The challenges described above are why we strongly advocate a focus on the highest value use cases and the related high-value datasets and data systems. This way you ensure context-aware and fit-for-purpose metrics and business processes are in place to focus quality efforts where you get the biggest return on investment first. You can focus on the minimum valuable metrics that emphasize trustworthiness and usability. This is what we call agile data quality.
Data cataloging and governance platforms can ensure decisions are driven from good data, power a use-case driven approach, and support the metrics and business processes for agile data quality. Supporting catalog capabilities (with “big 6” categories) include:
- Configurable metadata – Data status, business metadata, technical metadata, and relationships to document the context and best practices around data and analysis (completeness)
- Data profiling – summary statistics such as nulls, data shape, and more to understand the data values (completeness, uniqueness)
- Data lineage – ability to trace how data and analysis systems connect in order to troubleshoot problems, run impact analysis, and monitor changes (consistency, validity, accuracy across systems and processing steps)
- Data quality inspections – checks, scores, and alerts powered by the catalog as well as integrations with other data quality tools (validity, accuracy within columns/fields and data types)
- Usage & governance reporting – reporting metrics and monitoring in support of tracking adoption and reinforcing proper documentation and business processes (validity, completeness, uniqueness)
- Review cycles – certification processes (approved, recommended, or bronze/silver/gold, etc.) and recurring reviews – more frequently for the most critical assets (completeness, accuracy)
- Request workflows – metadata and data integration sync schedules, as well as quick processing of data, use case requests, data access requests, and fast provisioning of data access — which is vastly accelerated and facilitated with a federated virtualization platform (timeliness)
You will be tempted to evaluate the quality of your data through quantitative metrics, but this will lead to boiling the ocean. Instead, focus on the data needed in the highest-value use cases. You can achieve data quality through a combination of technical features (such as data profile, lineage, inspections) and agile data governance processes.
Remember, data quality is a team sport, not just a technical issue. Your company culture must embrace data responsibility!