Throttle up your data catalog with machine learning

by | Mar 4, 2021 | 2021, Data catalogs

How do you get from northeast Maine to Southern California? By walking, over 40 days. By cycling, nearly a dozen days. By driving, over 40 hours. By plane, about 7 hours. 

Working with data manually is like walking. You’ll get there in the end, but the amount of effort and time required to do so is drastically higher. Traveling by air is the ideal spot: it’s practical, realistic, and achievable. And it’s technology that’s accessible today.

Here’s the barrier: air travel requires planes. And planes cost money upfront, and they take some time to build. But once these planes are on the runway, they’ll be flying for decades. Case in point: Canadian airline Nolinor continues to operate a Boeing 737 that was first put into service in the mid 1970s, one of the oldest aircrafts still transporting passengers today. The longer it flies, the further its value travels.

 

Most of us are still cycling or driving

We have some help from technology as we work with data today. Google Analytics for website traffic, Snowflake for your data warehouse, or Salesforce for customer data. Yet today, 45% of a data scientists’ time is still spent on manual, tedious data prep tasks. It’s clear we don’t have a fleet of planes ready to take to the skies yet. 

Machine learning (ML) and a knowledge graph are long-term investments but, like planes, once they’re in place, they’re there for the long haul. Let’s break it down. What do these initial investments look like?

  • Do it early: technology iterates and tailors the experience to the individual as the more they use the platform
  • Build your schema: load up your metadata then connect these to the relevant disciplines in your business to build your catalog’s backbone
  • Add context: link data up with business terms and connect meaning directly to insight

Machine learning, Harvard Business Review

Take flight

It might seem daunting, but you don’t need to turn your sedan into an Airbus A380 overnight. Take it piece-by-piece and the value of your data will grow over time. ML technology is agile and iterative at its core. It’s a continuous process that evolves over time, and you don’t need to get it perfectly the first time. Be agile, iterate fast, and innovate quickly.

And once you’re in the air, you’ll begin to work with data clearly, accurately, and quickly. ML is there to augment your work, not replace you. Here’s how you can combine human expertise with ML scalability in an enterprise data catalog:

  • Tagging: detect and tag related assets in bulk, while you manage the definition and context
  • Recommendations: surface related content that you need, based on several metadata dimensions and usage patterns
  • Security: entity matching flags unclean and duplicated data, and automatically masks PII
  • Discovery: search results focus on relevance, not quantity of matched keywords, with support for NLP
  • Graph: works together with ML to expose meaningful relationships and context between data and concepts

Automatic entity matching and contextualized information in data.world

The same way that planes cannot function without pilots, your ML-powered data catalog needs people to succeed. Automations mean you’re able to analyze at scale and reach business decisions faster.

Don’t get buried in the weeds of tedious janitorial data work. Let ML handle what would be hours of data cleaning, so you can focus on your job. Re-prioritize your team to focus on more critical work: generating knowledge that fuels actionable insights. Pretty soon, it’ll just be like activating auto-pilot and cruising through at breathtaking speeds.

ML lets you sit in the pilot’s seat and reach your objectives faster. And the longer that plane is up in the air and getting used, the more value you get out of it.

Learn more about key capabilities and why data.world was evaluated as the top scoring current product offering in Forrester Research’s recent report, 2020 Forrester Wave: Machine Learning Data Catalogs. Download your complimentary copy of the report here.