This is Part Two of a four-part series about Agile Data Governance. In Part One, we covered lessons learned from software development history and how they should guide us in today's data challenges.
Let's quickly recall the definition of Agile Data Governance before we move on to what it looks like in practice.
Agile Data Governance is the process of creating and improving data assets by iteratively capturing knowledge as data producers and consumers work together so that everyone can benefit. It adapts the deeply proven best practices of Agile and Open software development to data and analytics.
In generating data assets, many companies have accrued what I call “knowledge debt”. That’s when data and analysis isn't documented, has no metadata, and isn't comprehensible. We can all understand why the many people tasked with creating data-driven cultures try to pay down this debt with a silver bullet. Yet, a healthy data-driven culture minimizes knowledge debt as part of the process of doing the work. Capturing metadata and documentation in the flow of normal work fuels reproducibility and reuse. Adding roles like data stewards, data product managers, and knowledge scientists makes this process easier because they act as scrum masters or product owners would if they were developing software, but instead they’re building data assets. As with Agile Software Development, Agile Data Governance needs tools that respect—and promote—the agile process and these roles. According to Gartner:
Effective data management and governance are people-driven practices. They require consistent and high-quality interaction between a variety of roles, and these roles have grown more diverse and distributed over time. Maintaining communication and collaboration is even more critical in the current conditions, creating an opportunity for data and analytics teams to add value by furthering the adoption of new types of tools and approaches.
Agile Data Governance starts by identifying a business problem, then gathering stakeholders who know about the problem and are trying (or have tried) to solve it.
Stakeholders include:
- Data producers: data stewards, data engineers, data product managers
- Data consumers: business decision-makers, analysts, data scientists
- Domain experts: others with deep knowledge of the problem
Stakeholders should think about classes of questions they’d like to answer with data in order to progress toward solving the business problem. Then treat each class of questions as a potential data asset and each question within a class as a user story.
From there, choose a question. Now, data consumers, data producers, and domain experts collaborate to find the answer.
Once the group establishes the hypothesis to test or question to answer, data producers gather data for data consumers to use. That curation has to happen in a durable place so its fruits can be preserved for others in the future. New knowledge and reusable assets will be created and captured with each iteration. Keep questions, clarifications, and modifications close to the data and work and make sure it’s easy to access so the next person with a related problem can find it.
Within these cycles, data producers will quickly learn what’s working and what's not about the data sources they’re curating, and they can make improvements in real-time. Doing analytics with a living, evolving data asset focuses stakeholders and provides valuable insights at high frequency. That’s why Agile Data Governance practitioners see ROI in days instead of months.
By cataloging the work as it happens in your data catalog, and not only the “finished” analysis, teams continuously learn from each other and elevate their data literacy. That’s because people learn data skills and domain knowledge faster by doing the work and seeing their peers solve real problems.
Transparency and iteration lead to progressively higher quality as teams refine analysis and data sources one step at a time. The completed reproducible output now gives people a jumping-off point. When people document their analysis as part of the workflow—not as an afterthought, which is today’s unfortunate norm—their coworkers can find, understand, reuse, and adapt it. As with software development (and everything else in life), it’s easier to start a data project if you've got something to build on.
As teams build data assets together and watch each other solve real business problems, the community of data producers, data consumers, and domain experts within the organization grows. Useful, creative, once-rare data practices will spread from team to team and become true, widespread best practices. Anyone who wants to make data-driven decisions will finally find what they need without friction or fear.
Above all, there’s one reason Agile Data Governance is the fastest, most reliable path to data-driven culture: your people multiply your data’s value, and their own power, just by doing their jobs.
For more on this subject, check out the third post in this series, How Agile Data Governance advances data-driven cultures.