Connecting data to the people who need it is a lot like shipping, hooking up fiber internet or running a marathon – the last mile is always the hardest. Some data catalogs do solid work pounding the data-governance pavement for the first 25 miles of the race, but when it comes to crossing the finish line, actually providing safe, secure, and auditable access to data, they come up short. Other data catalogs start on the podium. That is, they get the glory of granting access to data, but you don’t know where it came from or how it got there.
To effectively democratize access to data, you need a data catalog that covers the last mile.
What we do in the shadows
Most data catalogs in market today aren’t designed to both govern data and provision access to it – usually one use case is prioritized over the other.
If your data catalog prioritizes governance, you risk alienating data consumers who are beholden to complicated processes for accessing data. Rather than fill out paperwork, many consumers will break rank and go directly to the source (i.e. your IT team). Not only does that mean email or Slack hell for all involved, it can also mean not knowing who has access to data or why.
But prioritizing access by just allowing tables to be queried directly in the catalog doesn’t help either. While it may enable exploration, it doesn’t ensure that the data is used efficiently or correctly. It doesn’t give your stewards or data product managers the ability to curate data and guide analysts and data scientists to the data best suited for the job. Compliance headaches? Yep. Rogue datasets? Check. Unverified users accessing data? You got it.
A data catalog and governance solution should be your front office for data and analytics, not a place that encourages black boxes and shadow IT. A shopping cart for data that doesn’t deliver the last mile, doesn’t get the job done. But there is good news. One data catalog is rethinking how data governance is done.
Where metadata management meets data integration
At data.world, we believe the purpose of a data catalog is to connect people to data, and the only way to do that is through a combination of agile data governance processes and last-mile governance to enable exploration and access. Last-mile governance enables organizations to curate well-informed datasets and then share them for greater collaboration within the enterprise. Unlike other vendors who specialize in only metadata management or data integration, we do both.
Three keys to last-mile governance
For a data catalog to deliver on the promise of last-mile governance, it needs to offer three key capabilities. The first is the ability to curate your tables and sources into domain or use case oriented datasets regardless of where that data source might physically reside. By organizing around domain, analysts, data scientists, and other data consumers can quickly ascertain what data can and should be used for particular use cases without running afoul of compliance or governance rules. This starts to arrange your data as a knowledge graph organized by topic and is the first step in enabling a data mesh.
The second is federated, virtualized query powered by that knowledge graph. Since you’re organizing and curating those data sources logically, you need to be able to query the data regardless of where it lives. This is more than simple virtualization where your data catalog is simply a query workbench for a table. This allows your analysts to explore and join data, where it comes from – keeping them focused on business results rather than technical wrangling.
The final key capability is a request access workflow built around these topical, domain-oriented datasets, which enables consumers to easily discover data, its related metadata, sample data, and more. Consumers can request access with information about their use case and then receive access directly and immediately. This lets them get right down to work but also allows data stewards and product managers to fully audit that access and every query that goes through it. This enables data product managers unprecedented visibility to improve the usability of data assets and stewards everything they need for compliance.
Governance + access = better data management
For some of you reading this, thinking of access as a component of governance is counter intuitive to the “lock-it-down” strategy that has pervaded the industry for years. We’re here to challenge that thinking.
By including access management in the scope of what a data catalog and governance solution should be, we’re making it easier to break down data silos and collaborate at scale. It’s not open access – it’s access on rails. We’ve made accessing data fully auditable and predictable so it’s easier to manage.