The Data Mesh Balancing Act: Centralized vs. Decentralized Governance

by | Mar 29, 2022 | 2022, data architecture, Data catalogs, data mesh, Data-driven cultures

In the second post in our data mesh blog series, we introduced the Data Product ABCs framework.

Here, we’ll explore how you find the perfect balance of centralized and decentralized federated data governance to cash in on the full value of your data mesh.

Top-down centralized vs. bottom-up decentralized

As you build your data mesh, it’s imperative that your organization establish a documented method of federated data governance that balances centralization and decentralization. Let’s first review the two ends of the spectrum.

“Top-down centralized governance” puts the responsibility for establishing and enforcing principles on senior-level employees who determine the form the organization’s data governance should take. There are benefits to this approach, but the drawbacks include:

  • Front-line data employees feeling excluded from the process
  • Leadership enacting policies that don’t work for data teams
  • A lack of understanding from senior management who are several steps removed from the data itself
  • Bottlenecks around data access and use that delays creation of crucial business insights

By contrast, “bottom-up decentralized governance” depends on solutions developed by the people and teams working most closely with the data. “Bottom-up” helps focus on concrete improvements to practical, day-to-day processes. However, it potentially removes the connection to business stakes understood by senior management. It also increases the risk of inconsistencies in governance between domains, negating the impact of governance efforts entirely. This reinforces silos between groups rather than breaking them down.

Data mesh disrupts the status quo of delegating ownership of all your organization’s data to one team of highly specialized people who struggle to understand its value. Instead, decentralization gives ownership of the data to the domain team that knows and understands it best.

Finding the right governance mix 

Data mesh data governance scaleBut let’s be honest. We can’t swing the pendulum from a centralized world to a decentralized world and expect that everything will work perfectly. It all depends on the current process and culture within your organization and the ones you aspire to have.

We believe the solution lies somewhere in the middle: conjunctive governance — encouraging input from experts working hands-on with your organization’s data and developing global, inclusive, and resilient policies that work bottom-to-top. Execution of the process should be distributed across the domains, removing bottlenecks and breaking down silos.

Defining minimally viable governance

Domain decentralization only adds business value if all domains within an organization are governed according to documented global interoperability standards. That sounds like a big IF, but don’t worry, we’re not asking you to “boil the ocean.” Instead, you should start by defining the minimal, core, and essential criteria that make up a data product across all decentralized domains.

This should start small and improve incrementally to assure safety and interoperability. The best ideas graduate up from the individual domains to become more broadly established standards. Without this process, there will likely remain a significant danger of miscommunication and misalignment in how your organization works with your data, and the result will inevitably lead to more and more data silos.

Putting the Data Product ABCs framework to the test

Following the Data Product ABCs framework, we believe that A, B, and D are aspects that should be managed by each domain. C and E should be informed by global standards in order to ensure agreement on semantics, syntax, contracts, policies and access to data products.

The goal is to establish a standardized means of defining policies, contracts, and schemas instead of imposing them. This is done by making sure that everything is computable, i.e. its code. We believe in using SQL and well-established semantic web standards based on RDF, OWL, SHACL. Of course, there are scenarios where imposition is necessary for security or regulatory purposes, as in GDPR.

Here are a few examples of what global standards can look like when applied to C and E in the ABCDE framework:

Contracts and Expectations

  • Select a language to define all the contracts and expectations: SHACL, Data Quality Vocabulary, Great Expectations
  • Choose a standard architectural style (like Star Schemas or Data Vault for example)
  • Define contracts (ex: telephone Number, must have a permission to send voicemail and text messages)

Explicit Knowledge

  • Have a language to define schemas: OWL, JSON Schema, annotated or templated SQL DDL. Schemas defined for the core business concepts may appear in multiple domains. This way different domains won’t define schemas for the same thing differently. Each domain can extend it if required.
  • Separate data products for the core business concepts from data products of metrics. The latter are going to be combinations of the former. User data product + Activity data product = Metric data product. Each domain can have different definitions for a metric (ex: a user, may have mandatory and optional attributes).

The “domain-driven design” that finds the right balance between decentralization and centralization provides improvements in various areas:

  1. Scalability – By distributing ownership of domains, you empower multiple teams to “own” specific functional areas and smaller data products. When your organization reaches a certain scale, and is producing and working with a large amount of data, a single team — or *gulp* person — holding the keys to all of it can become overwhelmed by your data’s volume or complexity. Multiple teams working within a specific domain are more nimble and much more likely to have a complete understanding of the data within their purview.
  2. Efficiency – You can quickly combine data products to create new insights, which can then lead to new data products. Say yes to reusing data and no to more bottlenecks.
  3. Resilience – A domain-driven approach naturally means every domain is capable of managing the changes that may occur to specific operational systems and update data products accordingly, without having to depend on a central structure.
  4. Accountability – When a specific domain has ownership of data, they take responsibility and respond to incentives. They naturally want their data products to be the most used and valued by the rest of the organization.
  5. Reliability – Similar to the above, when a team or person is accountable for the quality of a data set — particularly when they’re the expert — that data is much more likely to be organized, accurate, and error free.
  6. Usability – When your data is organized, accurate, error- free, and “owned” by people who understand it completely, it’s vastly more likely that it’s easy to use. And if you do have questions, you know exactly which domain owners have the answers.

In short, centralize the things that are core to your business. Start small and iterate. Push everything else to the domains.

In our next post, we’ll be talking about the technology you need to support the people, process, and culture aspects of data mesh we’ve already covered.

For more information about finding the balance between centralized and decentralized federated data governance for your organization, download our white paper, The Data Mesh Governance Framework You Can Implement Today.