Sanjeev Mohan is the Principal of SanjMo. He spoke at the data.world summit in spring of 2022.
The promise of metadata is enormous, and the recent hyper-growth of data catalogs reflects that promise.
Data catalogs unify how our data is created, transformed, and consumed, and they have been accepted as the gateway to modern analytics architectures. They empower us to finally eliminate data swamps and regain control of our data lakes, and they allow us to understand and expose our data flow.
These advantages are now well understood in the enterprise, and data catalogs are becoming more widely deployed, with some organizations reporting many thousands of active users of their catalogs on a daily or weekly basis.
But in spite of their growth, when it comes to data catalogs’ overall adoption, we’ve barely scratched the surface. Even at organizations that have invested in a data catalog, a handful of challenges often lead to minimal usage and poor adoption from their workforce at large.
3 Enterprise Challenges to Data Catalog Adoption
Most Data Catalogs Struggle With Unstructured Data
One challenge organizations frequently encounter is that while many data catalogs are built to serve structured data use cases, less than five percent of most organizations’ corporate data is actually structured. Most of the time, the vast majority of an organization’s data sits outside of any structured space. This means that, for a data catalog to realize its full value, it needs to be able to catalog unstructured data.
For example, many current data catalogs allow you to see which of your products are popular, and can detail your products’ relationships to one another; this is structured data. But if people are writing about your product on social media, that data is not structured, and most current catalogs can’t accommodate it. This leaves you with an incomplete view of your data.
Most Data Catalogs Are Too Complex
Another cause of poor adoption is catalog complexity. Today, many data catalog workflows are too complicated for most non-technical users to understand, which causes business users to give up on finding what they need. In these instances, a catalog is acquired… only to be put on a shelf, rarely used, and ultimately forgotten.
Most Data Catalogs Offer Incomplete functionality
A third reason for the current state of poor adoption is incomplete data catalog functionality. Businesses are more dynamic today than ever, constantly changing and constantly ingesting new data. A data catalog needs to both account for today’s needs and match the tenacious pace of changing modern business. This means the days of performing quarterly software updates on a monolithic, on-premises data catalog are over; data catalogs must live in the cloud as Saas products so they can be constantly updated and improved, and keep up with the speed of business.
The Future of Data Catalogs
So what does the data catalog of the future look like? I believe it will observe four key tenets.
(To be honest, I took a page out of Zhamak Dehghani's Data Mesh playbook. She has four principles, and I agree with them.)
Metadata Will Drive Action
If you ask someone, “What is metadata?” They will reply, “It's data about data.” But metadata is going to become more than that; metadata will go from merely describing data to prescribing actions.
Metadata will become an action driver. It will tell you that, based on the current state of your data, you need to start a workflow that improves data quality. Or, it will illuminate a need for improved data privacy, and so on. Metadata will become the central driving force for prescribing the next steps your team needs to take to drive greater value for your business.
Your Data Catalog Will Become Your System Of Record
Next, your data catalog will become your business' centralized system of record. As I mentioned above, this means it will catalog more than just structured data. In fact, it means it will catalog more than semi-structured or unstructured data. This means your data catalog will catalog all of your data. Events, reports, dashboards, logs, clickstreams…. All of it in the catalog, and the catalog will become the first place anyone in the business goes to start any type of project.
Your Data Catalog Will Live In Your Revenue Stream
Many of the original data catalogs were not intended to live in your revenue stream. They were intended to live in what I call the “fear stream,” adopted mostly to keep your data orderly and secure in case an auditor came knocking. For this reason, they were built to empower “cover-your-ass initiatives” rather than data initiatives like democratization and monetization. But future data catalogs will be responsible for revenue generation, liberating the data hidden within your organization, and improving efficiency, accuracy, and insight.
Your Data Catalog Will Have A Contextual User Interface
Lastly, the data catalog of the future will feature an easy-to-search, easy-to-understand contextual user interface.
Let’s use Google as an example of what that will look like: I'm new to a neighborhood, so I use Google to quickly and easily look up a bank. Then, Google shows me not only the location of the bank, but also the hours it’s open, the phone number, the times it’s likely to be busy, and much, much more. That is rich contextual information.
That is what future data catalogs will provide. They will be simple enough that everybody can use them, without special skills. They will not only make the data you want easy to find; the underlying engine will make connections that help you discover new information you didn't know existed. And it will all be shown in a user interface that encourages you to drill deeper and deeper into your data, until it becomes a world of discovery.
How will these future data catalogs be able to do all that? Just like Google Search, they’ll be able to do it because they’ll be powered by a knowledge graph, a knowledge graph that maps and links key concepts to uncover hidden relationships, speed search and discovery, and provide unlimited insight.