3 Ways to Confirm Your Data Catalog Is Really Powered by a Knowledge Graph

by | Feb 4, 2022 | 2022, data architecture, Data catalogs, knowledge graph, POV

Why your data catalog needs to be powered by a knowledge graph

I’ve said it before, and I’ll say it again — your organization needs a data catalog

Modern companies need data catalogs to empower your workers to get more information from your data investments, gain better data insights as a whole, and make smart decisions more quickly. 

I’ve also been adamant that any data catalog you adopt needs to be powered by a knowledge graph. And I mean a real knowledge graph — we’ll get to what real means in a moment. 

A knowledge graph consists of nodes and edges representing real-world objects and the relationships between them. The nodes (bubbles) in the knowledge graph can represent tables, columns, dashboards, reports, business terms, users, domains… anything that exists within the data ecosystem of your organization. The edges (lines) represent their relationships, and how they’re associated, related, derived from, and so on.

Without a knowledge graph powering your data catalog, you can’t properly integrate knowledge and data across your organization. And beyond that, without a knowledge graph, your data environment and sources may eventually outgrow your catalog’s capabilities. Data catalogs powered by traditional relational technology and not by a knowledge graph are rigid and inflexible, meaning it can take months for them to support new types of data sources. And that’s not time any modern, data-driven business has to spare.

By building your data catalog on a knowledge graph model, you make it easy to quickly extend your catalog alongside your growing data ecosystem. That’s why data-driven leaders like Airbnb, Lyft, and LinkedIn have built their catalogs on a knowledge graph. And the industry has taken note.

Suddenly, everything’s a “knowledge graph”

The adoption of knowledge graph models by these industry behemoths has made plenty of noise in the world of data. Perhaps that’s why, all of a sudden, it seems every data catalog is powered by a knowledge graph, some seemingly magically switching from traditional relational technology overnight!

But just because you’re calling your model a “knowledge graph” doesn’t make it so. When you look beyond the marketing hype, you’ll see true knowledge graphs possess three distinct characteristics:

Three characteristics of a true knowledge graph

1. Show me the ontology

First of all, a real, legitimate knowledge graph has an ontology, which serves to create a formal representation of the entities in the graph and explain how they’re related. In short, it tells you what everything in your knowledge graph means.

The ontology is more than just a pretty picture of a schema. It should be machine readable. It should also reuse existing best practices and standards. The ontology underlying data.world’s knowledge graph consists of:

  • DCAT to represent a data catalog 
  • Dublin Core to represent metadata
  • SKOS to represent glossaries and thesauri 
  • PROV to represent provenance and data lineage

So if someone tells you, “My data catalog is powered by a knowledge graph,” you should reply. “Sweet. Show me the ontology.” If they can’t show it to you, they don’t have a real knowledge graph.

2. Is your knowledge graph extensible and flexible?

Any data catalog truly powered by a knowledge graph should be able to add, integrate, and catalog any data resource to your ontology immediately. It should be completely extensible and flexible, allowing you to add new data resources and ontologies and support any kind of additional data you want, whenever you want, no ifs, ands, or buts. 

If someone claiming to run their data catalog on a knowledge graph tells you they can’t support adding resources of a certain type, that some data has to be siloed, or that it’ll take months to make the necessary changes in order to do so, they don’t have a real knowledge graph.

3. Is everything in your knowledge graph queryable?

If a data catalog is powered by a true knowledge graph, you should be able to query all your metadata, everything that’s in there. This means that your metadata resources should be represented in standard graph format such as RDF (Resource Description Framework), your ontology is going to be built in OWL (Web Ontology Language), and you can query it with SPARQL (SPARQL Protocol and RDF Query Language). You should effectively be able to write any query you want; you should be able to ask anything, like you’re on Google. If you can’t query all the metadata within your data catalog, sorry, your data catalog isn’t powered by a knowledge graph.

 

When it comes to knowledge graphs, you can’t beat the real thing

Building your data catalog on a knowledge graph allows you to properly integrate knowledge and data across your organization, and prepare for the addition of any data sources or ontologies you could possibly want. But to realize all a knowledge-graph-powered catalog can do for your organization, you need to be working with a real knowledge graph, one that has an ontology you can view, is extensible and flexible, and allows you to query all your metadata.

If you’re shown a “knowledge graph” that doesn’t meet those criteria, find one that does.

And whatever you may have been told, data.world is the only data catalog powered by an enterprise knowledge graph.

Learn more about data.world’s knowledge graph solution.