Why is a data catalog important?
However, most companies are losing.
“Only 31.0% of companies say they are data-driven. This number has declined from 37.1% in 2017 and 32.4% in 2018. We are headed the wrong direction.”
- Clarity: Do your people understand data well enough to answer business questions?
- Accuracy: Do they believe in and rely on data’s accuracy when answering business questions?
- Speed: Do they answer business questions fast enough to matter?
|Created a Data Driven Organization||2017||2018||2019|
You must be able to pass these tests to succeed, because you can’t afford to fail.
“For every 100 employees, finding data and reproducing analysis is a $1.7M problem.”
The problem: Your data is meaningless to most people with business questions.
As a result, complexity skyrockets, lakes flood with meaningless data, and your people still can’t answer business questions. Garbage in, garbage out.
The more money, misinformed solutions, and supposed silver bullets you throw at data, the faster the data’s meaning declines and the less use it has to the people with business questions.
This is our current, unfortunate status quo.
The solution: bridge the gap between data and meaning to create explainable data.
Explain your data to make it meaningful
“Forty-seven percent of respondents report untrustworthy or inaccurate insights from analytics due to poor data quality. Only 14 percent of stakeholders had a very good understanding of the data and that less than 60 percent of the data was well understood by stakeholders.”
In fact, there are two existing technologies that, when combined, create new explainable data and rescue meaningless data from uselessness.
The first technology is the data catalog.
What is a data catalog?
Clarity: Everything needed to understand data is kept and maintained, from the beginning. As people use their data catalog, the data’s context deepens and its meaning becomes clearer.
Accuracy: A wider array of people can validate, improve, and correct data and analysis when they use a premier data catalog solution.
Speed: People can find what they need faster by organizing data and analysis in discoverable, business-friendly ways, providing Google-like search, and keeping all context within reach.
“44% of data worker time is wasted every week because of unsuccessful activities. 51% of searching activity is wasted, and 47% of preparation work is wasted.”
The second technology is the knowledge graph.
What is a knowledge graph?
Clarity: Knowledge graphs enable explainable data, expressing it in consistent, familiar, and understandable business concepts. Data from knowledge graphs can be exported to user-preferred formats that are compatible with the tools they know.
Accuracy: Knowledge graphs map meaning to data regardless of how it’s structured and where it’s located. Graph architectures are well-known for their ability to reference and collect disparate data sources, earning their spot in Gartner’s Top 10 Data and Analytics Technology Trends for 2019. Once you understand the data, you’re better equipped to evaluate its accuracy and correct errors.
Speed: Don’t waste your time searching for the naming, relationships, business meaning, and quality of your data. Knowledge graphs provide one clear view of data from multiple sources, so anyone can find data-driven answers quickly by using concepts that make sense within their professional domain. Graphs are flexible by design: you can add new data with no sweat no matter how many changes your business goes through.
How knowledge graphs connect data with meaning within the enterprise
If knowledge graphs are able to power the most widely-used search engine, they can definitely rise to any enterprise data challenge without disruption.
“Nine out of ten of the most value-creating companies in the world in 2018 were using knowledge graphs.”
Data catalogs powered by knowledge graphs are the future.
What to look for in a data catalog solution
“A data catalog maintains an inventory of data assets through the discovery, description, and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.”
Your data catalog must empower your workforce so they can get more information from your data investments and make smart decisions quickly. If your data catalog can’t do that, it’s not an enterprise-ready data catalog.
How will you know which one is which? Gartner identifies three distinct subclasses of data catalogs and how they differentiate themselves in the market.
Although data catalogs provide tons of information to your data teams, they are unable to help companies achieve self-service business intelligence on their own. As a result, building a data-driven culture becomes increasingly difficult. Many technical people hold this impression of a data catalog, but luckily there are more and newer versions available.
Simply put, no one. Having one data catalog connected to all of your data sources with a single source of truth is much more optimal. Don’t believe us? Eckerson Group says an enterprise data catalog should work with all of your other data investments:
“The value of a seamless user experience throughout the analytics lifecycle is evident, so the trend in [data] catalog evolution is toward convergence. Most tools will mature to become fully integrated solutions supporting all three capabilities – cataloging, preparation, and analysis. Convergence, however, does not eliminate the need for interoperability, as self-service analysts often want to make their own choices of preparation and analysis tools.”
An enterprise data catalog is truly the foundation of data empowerment. It’s not just a place to index all of your information, but it can also unify your people, data, and analysis so that it is easier to build a data-driven culture.
Similarly to how Google revolutionized its search engine with the knowledge graph, you can supercharge your data culture with an enterprise data catalog.
Now that we’ve talked about the broad categories of data catalog tools, here’s how you should go about choosing one to adopt.
Data catalog tools are exciting because they can democratize data across an organization. However, data is only meaningful to business decision makers if it is enriched with context, which comes from people and metadata.
Connecting data to its context is the difference between making the right or wrong decisions with data. For example, when using the imperial versus metric systems, using the wrong unit definition to hang a shelf might not be a big problem. However, this gap in understanding data and meaning is part of the reason the U.S. economy lost $3.1 trillion to bad data in 2016.
So what’s holding data & analytics leaders back from cracking the code and investing in a data catalog? After all, only a third of CDOs consider themselves successful at creating a data-driven culture despite their efforts.
“Early CDOs were focused on data governance, data quality, and regulatory drivers, but today’s data and analytics leaders are becoming impactful change agents who are spearheading data-driven transformation.”
From on-premise to the cloud, to hard drives and home laptops, data lives almost everywhere. Reliable and useful data is the core of modern-day business, however, some data may not be completely accurate, and data sources may not be known.
Despite the use of analytics tools, analysis is actually a thought process that predominantly occurs in people’s minds. Therefore, nothing gets documented or reproduced. You can’t see the assumptions, data, or insights behind the discoveries that analytics generate. Since it’s not preserved, determining what data and what approach to use becomes tedious and to be repeated for every project. To solve this issue, treat analysis like data: archive it, catalog it, and understand it.
Almost everyone in business works with data, but each person operates at a different level of data literacy. So to truly achieve a data-driven culture at your company, data must be accessible to everyone, not just to elite data practitioners.
Data and analytics leaders need to solve problems beyond just the technical ones. In fact, according to Gartner, “The top internal roadblock to the success of the office of the CDO is ‘culture challenges to accept change’.” Additionally, “93% of executives identify people and process issues” as the barrier to building a data-driven organization. The Harvard Business Review found that “the difficulty of cultural change has been dramatically underestimated in these leading companies — 40.3% identify lack of organization alignment and 24% cite cultural resistance as the leading factors contributing to this lack of business adoption.”
During your data culture transformation, no person can be left behind. Creating a data-driven culture requires convincing employees to adopt updated data practices, supporting cross-team collaboration, and empowering your people with data catalog products to help them work better, together. Most importantly, CDOs or other D&A leaders need more power to foster these changes.
However, CDOs and their counterparts don’t just need any enterprise data catalog, they need one that makes data easy to find, understand, and use to drive business change.
Part of driving adoption for your data catalog is choosing the right problem to solve with it at the beginning of your launch. Here are some examples of the kinds of challenges you could solve with the right data catalog.
Finding and understanding relevant information is laborious and can cause you to miss valuable opportunities or make uninformed business decisions. This is common for companies that don’t have a well-maintained, active inventory of data and analysis.
So help your business reduce the time and labor gap between asking a question and producing an answer by inventorying your data resources, enriching them with useful metadata (meaning) and validations, and connecting them to meaningful business concepts.
When data is disconnected from its relevant business concepts and initiatives, its context is lost. As a result, you have to start from the ground up on new analysis without building upon previous work.
Searching for the right data for an analysis can feel like being lost in a forest with no compass. So think like a cartographer and create a map of your best data with your data catalog. Because making your data assets accessible is the key to making them reusable.
This curated library of data sources can be anything from a slice of data from your data warehouse to your most popular, shared spreadsheets. Either way, the goal is to point the company to the 20% (or much less) of assets that provide 80% (or much more) of the value.
Our work with data is meaningless if it doesn’t influence the decisions we make. This is why sharing information with stakeholders ineffectively or incompletely increases risk and slows productivity. Lost cycles may cost hundreds of thousands of dollars, but a bad decision can cost millions.
Therefore, we need to ensure IT, data stewards, data engineers, analysts, and business people are collaborating. With cross-functional collaboration, analyses can be documented and shared in a way that is agile, iterative, and easily consumable. Workflows can also be reused and reproduced easily to deliver more consistent answers.
- If your biggest problem is understanding what data assets the company has and what they mean, inventory your most impactful assets.
- If it’s figuring out which data assets are most accurate and reusable for any given situation, curate what’s useful.
- And finally, if you have a recurring analysis or business challenge, encourage your colleagues to analyze, share, and iterate their analyses to make them reusable.
Check out these real-life stories from data.world customers who successfully launched our cloud data catalog and saw value from these use cases.
One of the world’s largest software companies created a business glossary and dashboard catalog.
A multi-billion-dollar software as a service company, focused on financial and human capital management, uses data.world’s cloud data catalog to index and organize its data assets, connecting them and the people who use them on a common business glossary. By building a single source of truth, the company’s employees are able to find what they need easily, ask the right colleagues for help, and use reliable data in a consistent way. Therefore, their data is more accessible, valuable, usable, and void of redundancy.
A global management consulting company enables their data to be found faster than ever before.
This is another customer who is renowned for being on the leading edge of innovation by applying data to help its clients answer their business questions. Any delay in answering a client’s question could mean lost revenue for the organization, so they aimed to streamline the process for finding, understanding, and utilizing data to produce analyses.
This company uses our data catalog to create a curated, user-friendly data portal. Consultants are able to find the right data faster and use it more often with the organization’s new portal, which contains owned, purchased, and derived analyses. Since data.world automatically gathers context, ongoing analysis, and identifies relationships between datasets, projects, and teams, the firm’s employees are able to be more connected and efficient with data.
The Associated Press uses curated datasets to transform the way news is reported.
On any given day, more than half the world’s population sees local news from the Associated Press (AP) through local media runs and reports. However, delivering story-relevant data to the right hands in local newsrooms is a daunting task. Previously, data would be distributed to the wrong people at the wrong time, getting lost in inboxes far from those who could use it. The most time-consuming aspect of this work (estimated at 80 percent of total project time) was finding, vetting, and cleaning data. As a result, the barrier to entry for using data was high, and local newsrooms usually lacked the time, staff, and tools.
AP and data.world make data journalism accessible by transforming the way data reaches local newsrooms. Technical users can now create and share queries faster without leaving the platform or spinning up a database. Additionally, less technical users can slice data for their local news markets without any prior coding or data science knowledge. Now with the option of exporting results in common formats, anyone can dig in and get clean data faster. Newsrooms across the country now have actionable data that can be used to inform the public on how national events affect their local communities.
Mirum, a global digital experience agency, streamlines their data projects for thousands of people around the world.
With over 2,500 people in 25 countries. Data—and data-literate people—are the key to how Mirium creates unforgettable experiences for clients like Mazda and Qualcomm. With their already sophisticated approach to data analysis, Mirum wanted to take the next step and better package their data to make their expertise even more valuable.
data.world helped Mirum streamline their new data practices and improved processes seamlessly across projects and teams. Discussion—between coworkers, between agencies, and with client stakeholders—shifted from email to dedicated project comment threads. Now, the full data project lifecycle lives on a single platform, data.world. Teams at Mirum not only do the work through data.world, but deliver it to its customers through the platform as well.
Aceable creates easily-consumable, mobile & digital first content for defensive driving courses. In order to recognize more revenue, they needed a quick way to retrieve data without exhausting the resources of its business analysts. With data.world, a single person at Aceable can now consume, integrate, and query the data to calculate revenue recognition. Streamlining this workflow reduces analysts’ workloads and avoids the time-intensive analysis bottleneck. Therefore, C-suite executives are able to receive important business data quicker.
Now it’s your turn. Prepare for your data catalog launch with these tips.
Consult and collaborate with your evaluation team
First, work with your evaluation team and executive sponsors on determining and tracking key performance metrics, so you can measure the impact of your data catalog tools. Don’t skip this step! You need to welcome differing perspectives from your colleagues and align everyone around the same goal from the get-go, or you could jeopardize the launch of your data catalog.
Most importantly, you want to track the impact of your data catalog use cases at every stage of the data lifecycle and for every role to see if it’s working. In order to do that, you need to understand how each of your teams currently work with data, what they want to improve, and how they envision that improvement to materialize from their day-to-day work.
To do that, take these three steps while launching your enterprise data catalog:
Data catalogs become more valuable as more people use them, so creating hype and developing buy-in is your bridge to a data-driven culture.
These three critical components of your measurement plan will ensure that your whole organization benefits from your enterprise data catalog pilot.
What else can you do to ensure the success of your data catalog launch?
Track your data catalog’s impact beyond usage metrics
Looking beyond platform usage, be sure to also measure the impact of your data catalog on team productivity, organizational culture, and overall business results. If this seems unclear at first, don’t worry. This will be an ongoing process to refine as you grow.
Remember, you invested in a data catalog to bring people, data, and analysis together and to give employees clear, accurate, and fast answers to any business question. Design your measurement plan to reflect that.
Need advice on how to start? Try categorizing metrics in these 4 buckets as you build your measurement plan.
PRODUCTIVITY: Are you working faster and getting more done?
DATA-DRIVEN CULTURE: Are more people collaborating with data?
USAGE: Is the right data being used for the right projects?
BUSINESS: Do you have a clear way to measure impact in dollars and cents?
These categories should reflect your most important priorities as a data and analytics leader. Be sure to benchmark your current state before launching your data catalog. Productivity metrics are particularly great to record and measure from the start, since you can capture them while determining success goals and metrics with the evaluation team.
On the other hand, metrics, such as usage, will probably only be useful after you launch your enterprise data catalog software, so keep that in mind as you move forward.
Doing this brings you one step closer to making your organization truly data-driven. And that’s your goal, right?
Given today’s challenging times, you may expect the ROI from your data initiatives to be lower than in years past. But according to Gartner, companies that offer a “curated catalog of internal and external data to diverse users will realize twice the business value from their data and analytics investments.” In fact, data catalogs can provide outsized impact in times where data is increasingly important.
We are seeing all kinds of businesses – from banks to restaurants to tech companies – make abrupt and, in some cases, multi-million-dollar changes to their operations. For companies trying to forecast sales two-quarters out or assess the stability of their supply chain, data catalogs are an incredibly effective tool for ensuring the data and metadata that support the analysis and decision making process is up-to-date, accessible, and understandable.
Want to see it for yourself? Get a demo of our enterprise data catalog and see what it’s like to begin connecting your data sources and building your datasets for collaboration!
We’ll show you how data.world makes it easy for everyone—not just the “data people”—to get clear, accurate, fast answers to any business question.
data.world makes it easy for everyone—not just the “data people”—to get clear, accurate, fast answers to any business question. Our cloud-native data catalog maps your siloed, distributed data to familiar and consistent business concepts, creating a unified body of knowledge anyone can find, understand, and use. data.world is an Austin-based Certified B Corporation and public benefit corporation and home to the world’s largest collaborative open data community.