Repair Broken Data Pipelines with Lineage

by | Aug 22, 2022 | 2022, data architecture, Data catalogs, data value, Data-driven cultures

Data lineage empowers data team members at every level of your business to understand and trust data pipelines. It enables faster data-driven decision making by providing full visibility into where data is sourced, how it’s aggregated, and any transformations it undergoes along its journey. 

As I explained in my recent blog post Build Trust with Data Lineage, there are two primary types of lineage, each of which serves a unique purpose and solves a different problem set: business lineage — which provides a summary view of how data flows from its source to where it is consumed — and technical lineage  — much more granular, and affords data engineers and other technical users a zoomed in view of infrastructure and data transformations. 

And then there’s the semantic layer that connects the two. 

Knowledge Graph Lineage

It’s important to note, only a data lineage solution powered by a knowledge graph provides insight into the relationships between key business concepts and technical lineage.

As detailed in my previous post, knowledge graphs are inherently semantic. Each one has an ontology, which serves to create a formal representation of the entities in the graph and explain how they’re related. In short, it tells you what everything in your knowledge graph means, making it easier to understand how data is connected.

All this is essential for operationalizing your data lineage and delivering on the promise of faster, more efficient data-driven decision making.

A Common Use Case

Now that you know why it is important for your data lineage solution to be powered by a knowledge graph, let’s explore the second of three common use cases it can help solve today. (We’ll cover an additional use case in the last posts in this series.)

Root Cause Analysis: Troubleshoot Broken Data Pipelines

According to a 2021 study sponsored by data.world and DataKitchen, 97% of data engineers are burnt out, and 70% are likely to leave their jobs within the next 12 months. The top two reasons why are:

  1. Focusing too much time on finding and fixing errors
  2. Focusing too much on maintaining data pipelines and/or manual processes

An automated knowledge-graph-powered lineage solution that addresses data troubleshooting can help alleviate this stress on data engineering teams.

For example, if your organization relies on a crucial data model in order to make real-time decisions and that model fails, lineage gives your data engineers the ability to quickly trace data flow and identify the root cause of the failure. (It’s worth noting that only knowledge-graph-powered lineage allows you to query the lineage itself; in the context of this use case, that means actually searching for and finding the upstream error.)


Root cause analysis with data.world Eureka Explorer Lineage

Investigating the code preview for PROFIT RATIO in a CUSTOMERS dashboard

Explore More Using data.world with Eureka™

data.world with Eureka Explorer™ is a map of data and relationships powered by the knowledge graph that simplifies the analysis of relationships between data, people, and insights. It bridges the gap between semantic business concepts and column-level technical lineage of the modern data stack with easy-to-navigate graph visualizations.

3 Data Challenges Knowledge-Graph-Powered Lineage Solves

To learn more about Explorer Lineage and the advanced use cases it can help you solve, download the white paper 3 Data Challenges Knowledge-Graph-Powered Lineage Solves.