Enterprises adopt data catalogs for a variety of purposes. One of the most popular is data discovery. But in today’s governance-focused world, connecting data consumers with the assets and analysis needed to make informed business decisions requires more than a simple query workbench. Your data catalog must also be architected for last-mile governance.
What is last-mile governance?
Last-mile governance is a continuation of Agile Data Governance that enables organizations to curate well-informed datasets and then share them for greater collaboration within the enterprise. Data catalogs that deliver true last-mile governance offer both metadata management and data integration capabilities that align to five key concepts:
Here’s what that looks like in practice:
The first step to leveraging data assets for business decision making is cataloging all of your data sources – something that is no small feat given the proliferation of modern data and analytics ecosystems. The key here is extensibility.
In the context of a data catalog, extensibility relates to the platform’s ability to quickly and easily catalog new data sources without having to overhaul the underlying metadata models or configuration, forcing a redeployment of infrastructure. Your data catalog should be able to absorb new information about your data and analytics ecosystem or represent new lines of business without costly re-engineering.
The most extensible data catalogs are cloud-native, feature a flexible metadata model, and offer open APIs to simplify data integration.
In his book, Winning with Data, Tomasz Tunguz describes five main challenges companies must overcome to create data-driven cultures. Data obscurity and lack of understanding is one of them. Primarily a documentation problem, this can (at least partially) be addressed with the addition of a data dictionary and business glossary.
In a blog post for Stanford University, Stephanie Winningham defined a business glossary as “a central repository that contains key business terms whose names and definitions have been agreed upon by cross-functional subject matter experts.” It is designed for use by non-technical users.
A data dictionary on the other hand, “allows a group to describe data regarding the physical data structure, type, format, and length, as they exist within a data schema.” It’s primary purpose is for database admins and architects to document how and where the data is stored and how it must be referenced to consume it.
Top-down data governance has pervaded data and analytics management for a number of years now, but that approach has not yielded the results many adherents expected when they first adopted it. That’s because data has been locked down to the point where it is completely inaccessible, and thus unusable. Because of this, many organizations have turned to Agile Data Governance.
Unlike top-down data governance strategies that seek to control and parametrize every aspect of data access, Agile Data Governance empowers all stakeholders to participate in an inclusive data and analytics process. But in order to practice Agile Data Governance (and by extension last-mile governance), your data catalog must have a request access workflow.
The request access workflow should be built around topical, domain-oriented datasets, which enables consumers to easily discover data, its related metadata, sample data, and other relevant information. Consumers can request access with information about their use case and then receive access directly and immediately.
Not only does this make it easy to connect analysts with data, it also allows data stewards and product managers to fully audit that access and every query that goes through it.
The ability to curate your tables and sources into domain or use case oriented datasets regardless of where that data source might physically reside is another piece of the last-mile governance puzzle.
Knowledge graphs play a critical role in data curation because they bring together people, context, and connections in a single semantically organized view of your data. With a knowledge graph you can:
- Connect to any solution in your data ecosystem including data quality, data lineage, data prep, and other metadata tools
- Logically organize data and metadata and in machine-readable format, speeding search and discovery
- Map data assets to key enterprise concepts to make them discoverable and accessible for greater user self service
Most importantly though, knowledge graph federated, virtualized query. This is more than simple virtualization where your data catalog is simply a query workbench for a table. This allows your analysts to explore and join data, where it comes from – keeping them focused on business results rather than technical wrangling.
The final concept of last-mile governance, impact, is where data assets are transformed into insights. Insights are informed by embedded collaboration capabilities. At minimum, your data catalog should serve up:
- Project-relevant data, analysis, and questions so you can build on prior work
- Tailored feeds that show team, collaborator, and project progress as soon as you log on
- Proactive alert that share what’s new, when it happened, and how you can contribute
These features, combined with the aforementioned concepts, enable true last-mile governance.
Empower your workforce with last-mile governance
The business goal of a data catalog is to empower your workforce so they can get more information from your data investments, gain better data insights as a whole, and make smart decisions quickly. Last-mile governance is central to this mission.
For more information on how a data catalog can help responsibly govern access to your data, schedule a demo with our team.