Why your Data Catalog Must be Powered by a Knowledge Graph

by | Jun 16, 2020 | Data catalogs

At data.world, we believe your data catalog must be powered by a knowledge graph. I’ll explain why in a moment, but first, let’s look at why data catalogs are quickly becoming a ‘must-have’ business platform. 

 

What is a data catalog and what can it do for you?

A data catalog is a metadata management tool that companies use to inventory and organize the data within their systems. The business goal of a data catalog is to empower your workforce so they can get more information from your data investments, gain better data insights as a whole, and make smart decisions quickly. 

To accomplish this goal, an enterprise data catalog needs to create and manage collections of data and the relationships among them in your organization and provide a unified view of the data landscape to data producers (e.g. data engineers, data stewards) and data consumers (e.g. data scientists, data analysts). These collections include tables and columns of a database, business glossary terms, analysis, and reports from BI dashboards. A key takeaway is that managing relationships should be the bread and butter of data catalog tools. That is where knowledge graphs come in.

 

Why data catalogs should be powered by knowledge graphs

Knowledge graphs enable the integration of knowledge and data at a large scale in the form of a graph data model. A knowledge graph consists of nodes and edges representing real-world objects and the relationships between them. The nodes in the knowledge graph represent tables, columns, dashboards, reports, business terms, users, etc. The edges represent their relationships: associated, related, derived from, owner of, etc.  As Gartner states: “Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos

knowledge graph

example of a data catalog knowledge graph

 

Extend your data catalog with a knowledge graph

You may be able to catalog the data that you are managing today, but how do you know that your data catalog can support the heterogeneous data formats of tomorrow? The last thing you want is your data environment and sources to outgrow your catalog’s capabilities. That’s where the knowledge graph comes in. Think of it as a way to future-proof your catalog investment. 

The knowledge graph data model is by definition flexible and agile. By building your data catalog software on a knowledge graph you get the flexibility of extending that same graph model across any new sources of data that you acquire or spin up. And you can easily connect the data to your own business terms. A knowledge graph makes it easy to extend the model to represent concepts and relationships that may have not been defined before without costly and time-consuming infrastructure changes. 

The very nature of a knowledge graph makes it easy to extend your catalog alongside your growing data ecosystem. That’s why data-driven leaders like Airbnb, Lyft, and LinkedIn have built their catalogs on a knowledge graph. Here’s a great quote from a 2017 article titled, “Democratizing Data at Airbnb”

“A graph of the ecosystem has value far beyond tracking lineage and cross-functional information. Data is a proxy for the operations of a company. Analyzing the network helps to surface lines of communication and identify facets or disconnected information.”

And clearly other companies are taking note as well. Consider this prediction from Gartner: “The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science.

Data catalogs powered by traditional relational technology are rigid and inflexible. This means it can take months to support new types of data sources. How can you empower your data workers when they can’t find, access, or use new, critical sources of data? Companies building their data management solutions on relational architectures almost by definition can’t be data-driven.

If you’re reading this blog, my guess is your data landscape is evolving and you recognize traditional methods of managing data don’t work. If you’re thinking about a data catalog, ask yourself how you’ll protect that investment. 

A cloud data catalog powered by a knowledge graph is the answer.

 

Join the conversation.

We’ll took a closer look at data catalogs and knowledge graphs in this episode of Catalog & Cocktails. Want to join the discussion? Register for Catalog & Cocktails and join us live every week.