The Role of a Data Catalog in Data Mesh

by | Mar 31, 2022 | 2022, data architecture, Data catalogs, data mesh, Data-driven cultures

data.world summit bannerIn our previous posts in this series, we focused on the people, processes, and cultural aspects of data mesh. Before we dive into the technical stuff, we should be clear about one thing: if someone tries to sell you a “data mesh platform,” run away as fast as possible! You CANNOT buy a data mesh.

Before even thinking about technology, you first need to understand how your business culture is going to treat data as a product and find the balance between centralization and decentralization. Technology is needed for support, but it is not the solution. And one crucial technology you’ll need to support a data mesh is a modern data catalog.

A modern data catalog plays a pivotal role in a data mesh

A modern data catalog must have two key attributes to support a data mesh:

  1. It must cater to both data producers and data consumers
  2. It must be powered by a knowledge graph

Here’s why:

  • Data producers and data consumers: A modern data catalog should have different lenses for different personas. A data producer is a technical user on a data product development team who needs to understand the existing operational systems that have been cataloged. They will use technical features such as data lineage, sensitive data scanning, rest APIs, etc. Once they have generated a data product, it also needs to be registered in the data catalog. A data consumer is going to use the modern data catalog to discover data products, understand the policies, etc. Different personas need a catalog that caters to different experiences.
  • Powered by a knowledge graph: Metadata is intrinsically connected and best represented as a knowledge graph. A knowledge graph makes it easier for data producers to represent the metadata, schemas, contracts, and policies. Additionally, a modern data catalog powered by a knowledge graph is powerful because it can be queried by any user.

Relationships between data products are automatically manifested in the knowledge graph which speeds the discovery process for data consumers. This is why knowledge graphs power the search and recommendation engines used by Google and Amazon. Remember, treating data as a product implies that you should strive to have the same experience you go through when you search and buy a product on your favorite e-commerce platform.

This is how Zhamak Deghani defines a ‘knowledge-graph’ interface on the mesh experience plane: “Browse the mesh of related data products’ semantic models. Traverse their semantic relationship to identify the desired sources of data.”

Data as a product and finding the governance balance

Striking the right balance between decentralization and centralization is a crucial step towards meeting your business goals. To find this balance, follow the Data Product ABCs framework and use SQL, as well as established semantic web standards based on RDF, OWL, SHAC to ensure ease of adoption.

As mentioned above: there will be friction. Embrace it! Friction is a sign of success.

Applying the Agile Data Governance approach to data mesh will result in ongoing iterative learning across your organization. But in order to do so, you need a data catalog that supports data producers and consumers, and is powered by a knowledge graph.

When you treat data as a product and you find the right balance of decentralization and centralization, data is transformed into knowledge and opportunities for everyone in your organization.

For more information about the role of a next-gen data catalog in data mesh, download our white paperThe Data Mesh Governance Framework You Can Implement Today.

Or schedule a demo to learn more about how data.world supports data mesh implementation.