Five critical data challenges your enterprise needs to solve

by | Jan 28, 2021 | 2021, Data catalogs

As a Chief Data Officer (CDO), you might be inclined to focus on the needs of your data analysts, scientists, and engineers. But the scope of the CDO office is much larger. Indeed, data impacts every aspect of your business.

The largest team in your organization is unquestionably your data team. Everyone works with data, and all of them need access to your organization’s collective knowledge to do their job with clarity, accuracy, and speed.

Let’s break down your big data problems into five key categories. Do these statements sound familiar? Maybe they even hit a little too close to home.

  • “Do we have that data? Where is it stored?” (visibility)
  • “These tables make no sense, and I’m getting different answers when I try to clarify.” (understanding)
  • “Who’s the owner of this data? Do I just email them to be able to see it?” (access)
  • “How do we know this data is accurate? What’s affected if we make changes to this data?” (curation)
  • “If I go on holiday, the entire process becomes paralyzed.” (business impact)

We’ve all been trying to solve these problems for years. As technology evolves, new strategies have emerged to varying levels of success. 

One thing is clear, though. The waterfall approach to managing and using data is unsustainable. That’s where a data catalog becomes critical to your strategic data initiatives, especially one that natively supports agile data governance processes and workflows. Let’s map each of these data problems to functionality in an enterprise data catalog and how it can address these five challenges.

 

 

Visibility: where your data lives

Search and discovery are central to this. Your team needs to be able to naturally search as they would on Google to find the datasets or projects they need for their work. A strong solution requires a discovery engine that takes into account metadata, data, people, business glossary, data lineage, and more.

Not all search engines are on par. You know that feeling when you type up a search, but the search engine completely misunderstands you. Avoid those, and make sure you use data catalog software that shares search results by relevance. They combine data, metadata, and context to unearth what you’re looking for, fast.

 

Understanding: what your data means

Consider this story of miscommunication at an inopportune time: a prospect will only sign your three-year contract if you can definitively say you’ve fulfilled over 100,000 orders in the last year. Minutes before the meeting, you realize that your company has only completed around 70,000 orders.

What went wrong? Not the data, but the meaning and context. Does order mean how many transactions were made, how many items were sold, or something else? 

Raw data is meaningless without context. Columns, rows, and cells by themselves are not usable and cannot drive business value. It only becomes meaningful when you know what those numbers mean. That’s where a business glossary comes into the equation. It connects data to an internal wiki of concepts, and documents methodologies.

 

Access: who owns the data

Sending emails to people you aren’t sure are even relevant to answering your question is tedious. You end up wasting your time and theirs. And when you finally get to the data source, you end up with a static spreadsheet file emailed to you.

That’s an outdated yet all too common way of working today. Data catalog tools are a critical middle ground for facilitating access. Think of it as Google Drive for your data.

All the metadata (who owns it, what it’s about, last modified, etc.) lives alongside the data. You can discover this data by searching. And when you find what you need, self-serve within the platform to get access to the canonical or derived dataset. This wouldn’t be any different from requesting to view a Google Doc file – fill out a form, sent directly to the owner.

Remove confusion about where data lives, who owns it, and whether it’s relevant. It’s all in one place.

 

"The effort of us collectively is really our superpower" - DJ Patil speaking at data.world summit 2020

 

Curation: how accurate the data is

Uncertainty about data cleanliness and usability discourages your team from working with it. They need to be confident what the data they use is accurate and approved before using it for their work. 

The good news is that the accuracy of data is distinguishable by placing it side-by-side with context. Combine the raw data with metadata, and intuitive governance workflows.

Make those governance features people-centered: crowdsourced suggestions, usage data, comments, data request access workflows, and more. See how many users have used a dataset or have a steward certify a dataset to provide a layer of human curation. Then, combine that with data lineage, impact analysis, and historical versions to get a 360-degree view of your data. That’s agile data governance.

By combining human and technological quality checks, your team can confidently use data to generate valuable insights.

 

Business impact: the high stakes for your performance

If you can’t find, understand, or trust your data, you can’t really use it. And if you can’t use your data, what’s the point of collecting it? Leading innovative companies today have democratized data at the heart of their strategy.

Data is not optional. Your team needs to be able to autonomously use data, and it can’t be gatekept by a specific team or individual. Otherwise, you end up losing out to those who have a healthy culture around data use who have all team members encouraged to engage with data.

The way we handle and manage data is shifting fast. And more of your team must be able to work with it to succeed. The solution you need must be flexible and agile, easy to implement for all users, and solves your practical business challenges.