Data discovery is a primary use case of data catalogs, but your catalog can be more than just a searchable inventory of data assets and analysis. An enterprise data catalog should make you smarter, more efficient, and help you gain greater value from your data – the more automatic the better.
But automation for the sake of automation is not the answer if it doesn’t help you truly operationalize your data and analytics environment. Some data catalogs leverage basic machine learning and flashy AI tricks to surface query results on the basis of popularity alone or do simple labeling or tagging of columns; others deliver complex workflows but fail to address the challenges of siloed data. What’s missing is the contextual, semantic relevance only a knowledge graph can provide.
The machine learning black box
Machine learning is often hyped as a silver bullet for data and analytics. Unfortunately, it doesn’t always extract the value it promises, nor does the “black box” approach supply a transparent model that can be used to justify outcomes.
In their 2019 article, Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition, Cynthia Rudin and Joanna Radin wrote this on the topic:
“In machine learning, these black box models are created directly from data by an algorithm, meaning that humans, even those who design them, cannot understand how variables are being combined to make predictions. Even if one has a list of the input variables, black box predictive models can be such complicated functions of the variables that no human can understand how the variables are jointly related to each other to reach a final prediction.”
Historically, machine learning was designed to assist in low-stakes decision-making like optimizing digital advertising campaigns. In these instances, “It just works” adequately answered the question of the how behind the tech. This doesn’t jibe in the modern enterprise where machine learning outputs can inform million-dollar decisions.
Without context, a singular machine-learning-based approach to automation is often cosmetic, incorrect, and incomplete. Rather than putting your faith in a black-box data catalog, trust the transparency of automation and answers powered by a knowledge graph.
How knowledge graphs unify data into a single, simple, and extensible model
Before diving into the specific capabilities of a knowledge-graph-powered data catalog, let’s take a look at why it is the preferred architecture for intelligent automation.
A data catalog underpinned by a knowledge graph provides a single, semantically organized view of your trusted data. Because it adheres to RDF standards, all data – unstructured, semi-structured, and structured – is easy to unify, find, and trust.
Knowledge graphs bridge the gap between how your organization’s data consumers understand their business world and how the company stores its data. They are unique in that business terminology is represented as concepts and relationships that are both understandable by people and machines in the exact same way.
By building your data catalog software on a knowledge graph you get the flexibility of extending that same graph model across any new sources of data that you acquire or spin up. And you can easily connect the data to your own business terms. A knowledge graph makes it easy to extend the model to represent concepts and relationships that may have not been defined before without costly and time-consuming infrastructure changes.
Data catalog automation powered by knowledge graph and SPARQL
The beauty of querying a data catalog built on a knowledge graph is that most automation can be implemented using just a SPARQL query – no black box needed.
SPARQL is a recursive acronym for "SPARQL Protocol and RDF Query Language," which is described by a set of specifications from the W3C. SPARQL is a semantic language for databases used to retrieve and manipulate RDF data. It is recognized as one of the key technologies of the semantic web due to its flexibility as well as ease of joining complex data structures and detecting intricate patterns in data.
Due to the standardization of relational databases, users are required to know the entire structure of their data to make useful connections between datasets, which creates barriers to entry for inexperienced data users or those exploring new data in real-time. Conversely, the RDF data model with SPARQL is a much more accessible, exploratory paradigm and does not require knowledge of the data schema to make useful joins between datasets or to automate workflows and tasks.
Here are a few examples of business-first automations that can be achieved using SPARQL queries alone:
- Metadata update actions
- Usage/popularity analysis
- Sensitive data workflows
- Data certification
- Quality issue annotations
Learn more about knowledge graph capabilities
Configurable knowledge graph automation makes deploying and managing your data catalog faster, easier, and smarter. But automation is just one of many benefits a knowledge graph can deliver to your organization. They make it possible to access data on your terms, even as applications evolve and data’s meaning changes.
To learn more about how knowledge graphs can help you drive better business outcomes, download our white paper, Why Now is the Time for Knowledge Graph.