Two years ago, after closely partnering with customers and prospects, we created our data governance solution. We listened to their needs, iterated based on their feedback, and defined what has become known as "Agile Data Governance.” Our focus on simple workflows that support a wide variety of different governance scenarios has been well received by the market. Many of our customers, like Penguin Random House and The City of Rochester, have found success with this simple, effective approach.
But we didn't stop there. We continued to dig in, iterate and learn, and it became clear we had more work to do. As we worked with larger enterprises in regulated industries such as banking and healthcare, we realized that a more flexible solution was clearly required to meet their goals for data governance. Simple approvals and discussion threads needed to be augmented with deeper "Governance,” including multi-step workflows and 3rd party integrations.
The challenge before us was to provide these capabilities without compromising the simplicity our users have come to expect from our Data Catalog Platform.
What is Data Governance?
At its highest level, data governance is about managing change over time. A Data Catalog Platform must grant access to resources based on roles and responsibilities. When issues are discovered, or changes suggested, the data governance team needs to be alerted and appropriately tasked to affect the correct changes in the system. While many activities can be automated, bringing humans into the mix is essential for a successful data governance program.
What's more, humans spend the majority of their working hours in systems outside of the data catalog. It's important to ensure that the catalog is integrated with these platforms. When users visit a report in Tableau or PowerBI, they should be alerted to issues with their data real time and inline. When data access is approved, or sensitive data discovered, the catalog should interact with 3rd party systems. Sometimes this means creating tickets in systems like Jira and ServiceNow. Other times, it means reaching out to Snowflake and applying custom tags and policies to control access to sensitive data. The platform must be flexible, extensible and integrated with the systems that the organization depends on.
However, this sort of flexibility often comes at the cost of simplicity. Complex configurations and fully custom workflows are typically hard to define and challenging to maintain over time. We often hear from our customers, "we know there must be common scenarios that all organizations face, please help us understand the best way." This is a noble goal, but the reality is far more complex. Each organization is different with different sources of data and different policies. The catalog must support many different configuration scenarios. How do we reduce the complexity and still provide a flexible solution that addresses the unique needs of our customers?
Finally, and perhaps most importantly, change must be auditable. Each and every user interaction, each change, each approval, each addition, must be logged in a durable way. It is critically important not only that changes be tracked, but also that they be easily queryable. Catalog teams must be able to see and report on change. Audit event stream data must be available and query-able to the team.
Automations
After considering the needs of our diverse customer base and the market, we focused on creating a solution that is both flexible and simple. We've taken our learnings from years of building and deploying data catalogs in increasingly complex scenarios. It's clear that while there are some consistent patterns that organizations face, each organization is also unique. In order to support this broad assortment of use cases, we're introducing Automations, that we call Eureka Bots.
These automations are simple, pluggable, parameterized bundles of templated functionality, provided in a familiar "App Store" experience. Catalog admins can learn about the capabilities of specific automation and then configure the automation for the organization's needs. Automations and configurations are versioned and can be upgraded on your schedule as new features and functionality are added. Teams are not locked into data.world's upgrade cycle but maintain control over the fundamental configuration of the catalog.
The automations are constructed from a powerful set of building blocks which allow the Data Catalog Platform to be extended in a variety of ways, depending on the needs of the organization. Different automations mix-and-match these capabilities to define powerful multi-step workflows that can act not only on the catalog and its users but third party systems as well.
Ontologies: Inject source, industry and customer-specific data types, attributes and relationships, extend the catalog to support the concepts most important to your organization.
Reporting & enrichment: Powerful query-based reporting and enrichment capabilities ensure catalog data stays clean and consistent. For example automatically deprecate and alert on data resources which have not recently been reviewed.
Triggers: Time, activity, and user-initiated operations enable your catalog to respond to the needs of the business in real-time. Triggers can kick off user-oriented approval workflows, build reports, and enrich catalog data.
Workflows: Powerful, BPMN-based workflows ensure humans and robots work together in tight coordination. The all new "task management center" enables users to track requests, provide additional data and approve requests. Deep integration to 3rd party systems allows for ticket creation (e.g., Jira, ServiceNow), and writing tags and masking policies into your warehouse on the Snowflake Data Cloud.
Of course, these capabilities will only be added to the catalog when templates are configured and "enabled." Templates can also be disabled to remove associated types, triggers, and workflows.
Out of the gate, we're providing a number of the most requested automations by our key partners, but this list will grow quickly. So stay tuned!
The automations address many of the most common, manual tasks data governance teams need to address:
Data Access: Simplify data access controls and remove friction for data consumers
Metadata Enrichment: Streamline enrichment and free domain experts to focus on business value
Metadata Completeness: Monitor metadata quality with automated scoring and reporting
Metadata Freshness: Ensure definitions and glossaries are current with automated evaluations
Ownership Assignment: Prioritize business-critical data by assigning dedicated stewards
Query-Based Actions: Easily automate governance workflows with custom action-based rules
Data Catalog Platform
Of course, Automations are made possible by data.world's feature-rich Data Catalog Platform.
Flexibility, federation, and semantics powered by our knowledge-graph core
Advanced domain and collection oriented access control
Extensive audit data delivered as both in-app datasets and Snowflake Secure Data Shares.
Powerful built-in query workbench, including advanced data virtualization capabilities using both SQL (relational data) and SPARQL (graph data). Query "live" tables from your data warehouse on Snowflake directly from data.world.
Beautiful and interactive lineage views
Extensive library of metadata collectors capable of pulling metadata from a growing variety of sources
It's just another example of the power that comes from an open and extensible architecture, based on knowledge graph technologies. If you're interested in learning more about data.world and our data governance solutions, schedule a demo with us today.