Your disorganized data lake can feel more like an ocean. But you can take control. Set the boundaries for starting up your data initiative. Without it, the problem can seem never-ending. Break your data solution up into bite-sized steps to achieve incremental (and growing) benefits.
A long-term future for your organization requires a solution today that combines data, technology, and people. We can divide your journey to solving your data problem into three distinct stages. And yes, this is a journey – your decades old problems can’t be solved overnight.
Crawl: set the scope and focus
Turn that boundless ocean into a pitcher of water. Narrow in on what data you really need. We often hear from clients who want to catalog, clean up, and enrich all their data. And that’s a great ambition – of course we want to modernize and clean up our entire data ecosystem.
But here’s the reality: there are a select few datasets and databases you need for your current projects. Focus on this aspect of your data to clean up and organize. These will be most worthwhile and have the most immediate impacts.
And the good news? You’ve already done half the work. You know what you’re using on a daily, weekly, and monthly basis. All you need to do is document it, and use it as a guide for the next stage…
Walk: access and work with the data
You might be thinking: “everything I’ve thought about in the ‘crawl’ section is data I can access, so what’s the deal?” But access is not about getting a hold of data no matter what.
You’ve been through and tolerated data access problems before. Unable to access data because your point of contact is on vacation. No clear ownership of data. Having to email many people to figure out a small thing. No idea if we even have a particular dataset. Finding a dataset, only to have it emailed as an Excel attachment and become outdated hours later.
These are some of the unsustainable practices of working with data. And we can change it. Google Drive serves as a model for what the reality could be. Shared folders as a home for files that other teams use. Permissions are both simple and precise. Workflows for requesting access that are self-explanatory.
It doesn’t look any different in a data catalog solution: discovery, permissioning, access, data masking, and requesting should be intuitive. Data stewards and consumers in dialogue, rather than a collection of distinct transactional requests. Living documents, instead of static spreadsheets. Transparent catalog browsing, not opaque.
Google Drive has shaped how enterprises cross-functionally work together with their straightforward workflows. Subject matter experts can validate work through real-time collaboration, comments, and suggestions, fast.
Data collaboration should be the same. Comments and discussions alongside data mean enhanced data analysis. Instead of endless email threads, transparent feedback means experts from many teams can add value to the project.
At this point, you’ve crawled and walked. You have a foundational data catalog with discoverable and usable datasets. Now, your daily analysis work is more streamlined and your work is cross-functionally enriched. You’ve boiled a pitcher of water.
Run: connect your data
From pitchers to lakes and beyond, it’s time to zoom out further and look to your broader data fabric. But, it’s not quite the same ocean we talked about in the beginning. You likely have some large databases that will never need to be actively cataloged. That’s because those datasets only exist because they’ve always been there – apart from a few fringe requests, they are actively untouched by the broader business.
As you zoom out, continue to focus on building an interconnected, self-service library for all members of your organization. Catalog the data they need and use. Create an environment where data, metadata, and meaning can co-exist to provide greater context. Only then can you fully realize the true value of your organization’s knowledge and drive business value.