Quality, quality, quality. “We need high quality data!” It’s something you hear a lot in data circles.
But what does “high quality data” mean to you?
data.world’s Principal Scientist Juan Sequeda recently asked this question of his followers on LinkedIn and Twitter, and the responses were interesting. They also varied tremendously based on the commenter’s understood use case and data type.
Many responses focused on data being of a “high-enough” quality that its analysis doesn’t lead to poor or inaccurate decisions, but instead aids in sound, consistent results. Other responses focused on established definitions of “high-quality data,” and suggested only data meeting specific technical requirements be considered as such. And some argued that any data fit for its intended purpose — i.e. “good enough” — would qualify, particularly given the priority of a given project versus the time, cost, and risk associated with its outcome.
Below, we’ve collected some of our favorite responses to Juan’s initial post, grouped by data type and use case. Take a look, and share your opinion with Juan on Linkedin and Twitter.
What Does “High-Quality Data” Mean to You?
Chris Welty, Research Scientist at Google: “To be completely honest, I think most people actually mean that they don’t want data that’s noticeably low quality, not that they actually need the quality to be ‘high.’ Unfortunately, high-quality data is more or less invisible, while low quality data can be very prominent.
Low-quality data is data that causes a failure in some system, which is why it’s prominent or noticeable. For high-quality data, how high is enough? How much time/effort/$$$ do you spend before you have mitigated the risk of quality related failure? And how do you know if you have over-invested?”
Christian Kaul, Data Structure Designer at Obaysch: “(High-quality data is) data that is organized around meaningful concepts, connections and details, and paints a consistent, current — enough — picture of these things.”
Ken Evans, Managing Director, The ORM Foundation: “According to Crosby, ‘quality’ is defined as ‘conformance to requirements.’ Crosby’s book describes the meaning of quality and the procedures for achieving it. For example, one requirement for data is that the schema be in 5NF.”
Francesco De Cassai, Senior Manager, Accenture: “A ‘good enough’ approach is more reasonable. ‘Good enough’ meaning, are they understandable? Can they be used to support your decision without misleading? That is enough. It depends on the consumer of their data and their data appliance… not a dogma or a 101% rate of success.”
Vipul Parmar, Global Head of Data Management at WPP: “(High-quality data is) data that is fit for purpose in its intended use(s).”
Sajjad Khan, COO Technology and Non-Executive Director, CLS Group: “‘Data quality’ is a meaningless concept. It’s like saying ‘communication quality.’ However, we make a distinction between remarks in a routine meeting with a critical board pitch. Data similarly needs to be categorized and linked to an organization’s mission, strategy, and risk appetite. Only after that can the necessary trade-offs be assessed around the cost, risk, and time versus quality. Some data has to be platinum; other data can be bronze, and most can frankly be deleted.”
Alfredo Serafini, Knowledge (graphs) Engineer and Data Geek, Almaware: “I think it’s not a final goal, but a process. So if an organization takes care of its data, and it’s always open to improving them in various scenarios, I think they are headed in the right direction. If, instead, the idea is (make it perfect) once and for all, I think there is probably a misunderstanding. In my opinion, there are a lot of good practices that should be used most of the time, but there is no such thing as a universal high-quality standard for data.”
Artem Chigrinets, Product Manager, SqlDBM: “I would say ‘data quality’ is a buzzword that means ‘building a system you can rely on, and doing it from the very beginning.’”
Ajay Raina, Data Modernization Practice Head and Global Chief Architect – Lifesciences, Cognizant: “Traceability of data impacting timely visibility into business metrics / KPIs with audit balance and control is true data quality. And being able to measure, monitor, and mitigate those issues to improve trust is equally important.”
Hassaan Akbar, Machine Learning Consultant, Systems Limited: “Requirement for quality changes with the use-case. Mission-critical apps have very little error tolerance, and their relying on wrong data can result in huge losses. On the other hand, innovation and exploration can work around some quality issues. In a broader sense, it’s more like a journey than a milestone. Any definition or standard of quality should account for what the data will be used for. Sharpening the knife is good, but when and how much is the key.”
Marten van der Heijden, Data Architect, Tata Steel in Europe: “It all starts with a thorough description of your data. Data in an application or database derived from what is shorthand for fast communication and is extremely dependent on the description of its context and of the context. Quality data should comply with the description. If the description is insufficient the perceived data quality will be bad, though it may objectively comply. Underestimation of the importance of good descriptions lies at the core of most data quality issues.”
Jeremy Posner, VP Product and Solutions, RAW Labs: “Good Data is about people and process >80%, and Tech <20%. Until organizations realize this, it will be the same old story, just with new tools. I.e, garbage in, garbage out.”
Ergest Xheblati, Data Architect Consulting, Self Employed: “Consistent typing. Consistent meaning. Context Relevant.”
What do you think? What does “high-quality data” actually mean? Join the conversation on LinkedIn and Twitter, and let Juan know.