Scaling enterprise infrastructure. A big project for some, and a horrifying thought for many. Your once small and manageable issues are now a colossal tidal wave. This is a rude awakening as your technology and resources become bigger.
Let's prevent these challenges before your data engine stalls. The good news is this: healthy data catalog maintenance routines aren't too daunting.
But don't get too complacent: as a data engineer, you must set good habits early. So, let's get you on the driver's seat of your supercharged vehicle, and power up your data catalog.
Tip #1: Tune-ups
Hold on a minute. You might be wondering: what does a regular tune-up have to do with supercharging? Isn't that the complete antithesis?
Not exactly. Regular tune-ups are critical. Without them, your vehicles suffer degradation, failure, or no longer run optimally.
The same is true for an enterprise data catalog. A good maintenance routine ensures:
- data stays fresh and doesn't spoil
- the highest data standards and best practices are maintained
- documentation is relevant, timely, and topical
Invest in tune-ups by setting regular data review cycles, meeting with your core data team, and more. Consider strategies such as the Friday Afternoon Method (FAM) or similar frameworks to get started.
Check-in often and take responsibility so that your data catalog tools run smoothly.
Tip #2: Understand metrics
But events do happen. Things do go wrong, and the alarm bells start ringing.
When your "check engine" light goes off on your car's dashboard, do you ignore it? Probably not, because you know that neglecting to fix this will cause more damage later on.
The metrics related to your data catalog are your warning lights and your indicators. Use these quantitative measurements to objectively gauge the health of your data initiatives.
Metrics in your data catalog allow you to learn more about:
- what's being used and what's not
- whether your data catalog is driving value for your organization
- incident alerts and management
These help you make better, more informed decisions about what's working (and not) in your data strategy. Ensure you continue to be aligned on your company's KPIs as you track this.
There's one caveat: metrics are only useful if there is an action associated with it (otherwise, why bother collecting it?). Don't just look at the blinking light on the dashboard, address it.
Be strategic about identifying how individual metrics align to your broader use cases, and the problems you're trying to solve.
Tip #3: Crowdsource
At this stage, you've been successful at maintaining your vehicle. Now it's time to get a competitive edge against the competition.
Consider a race car, and the team behind each vehicle. It's not just the driver who's racing, but also the team at the pitstop, the coaches on the sidelines, and the analysts in the control room.
Each person on the team is good at what they do, they collaborate in real-time, and they achieve greater things together.
Have you heard the expression "data science is a team sport?" It's commonly attributed to one of our advisors at data.world and the former US Chief Data Scientist DJ Patil. And he's absolutely right: this analogy applies to data science as well. Without your data team, data initiatives fall apart.
A best-in-class user experience is not optional with a cloud data catalog. You need to enable all your data producers and consumers to contribute their expertise and build a scalable data community.
Tip #4: Agile data governance
Having this collective data workforce working together is central to agile data governance.
Consider the classic phrases about ability:
- crawl, walk, and run
- don't boil the ocean
- iterate, iterate, iterate
If you want everyone in your business to use data, you need to give them access. Traditional data governance strategies are failing left and right.
Instead, this has given the rise to data strategies that allow your team to move fast, and slow down safely. Much like your car on the race track, the fastest acceleration is only as good as the quality of the brakes. If you have one and not the other, your car will crash out of the track.
Agile data governance strategies that allow you to move fast without sacrificing security include:
- documentation, and making data assets reproducible
- encourage comments and discussions alongside data
- data testing, profiling, and quality tools for monitoring
Tip #5: People + technology
Without people, your technology will fail. This is by far your most valuable resource of all, but also the most easily forgotten.
A data catalog powered by a knowledge graph benefits from more people using the platform over time. It should provide quick access to data and the people, even before you know you need it.
Like racing analysts who look at replays and other data, they provide something that technology can't: context. They can run tests and hypotheses, build experiments, and better understand data.
Technology? It can augment human-led initiatives, scale them up, and capture knowledge. But this growth, scale, and knowledge only matter if we can understand what meaning to extract, and what to do with that information. That's where your people and your data community matter most.
One more thing...
If you don't know the word "ontology" - find someone who does. Data catalog software is like your public library: it will only go so far without a librarian who can manage it.
When you put all these tips together, you're creating a central data and knowledge hub for your entire business, for your entire workforce. As a data engineer, you have an incredible opportunity to make a big impact.
It all starts with those regular, overlooked tune-ups.