ETL (Extract Transform and Load) was the standard operating procedure for data integration for 25+ years. A decade ago the introduction of data lakes pushed transformation to the end of the process and into tools like Snowflake, BigQuery, and Redshift. Now the latest chatter in the data management industry is Reverse ETL. Shouldn’t we call this LTE?
Catalog & Cocktails hosts, Tim and Juan were joined by special guest, Tejas Manohar, CEO of Hightouch for a conversation about Reverse ETL and why the heck it matters now. Below are a few questions excerpted and lightly edited from the podcast.
“WTF? What is reverse ETL?”
Reverse ETL really is a specialized form of ETL. It’s about moving data from the warehouse into different operational systems around the company. So, imagine you have all your data in your warehouse, and use a typical ETL provider like a Fivetran to get it in there. We do the opposite. So, you can take the data from the data warehouse or data lake or anything that can run SQL, and move it into different operational systems. It’s the reverse of a typical ETL process.
It is definitely just like ETL in a lot of ways, but the interface is just so much different than the data replication tools that you see in the market today, like a Fivetran or Stitch or anything like that, that are really focused on just getting data into the warehouse. It can be there in any format as long as it’s getting there reliably. You can query it in SQL, and that’s all you really care about; does the user have something like one of those services, where the reverse ETL is a very different interface to use it? Customers need a lot of control on how the data appears in a system like Salesforce or Facebook Ads or Google Ads. Those systems can look completely different from each other, whereas a database or data warehouse is always the same.
So, there’s a different technology paradigm. There’s a different persona paradigm. You started to get a little bit into why people want to do reverse ETL…
What is the business value of Reverse ETL?
At a high level, the idea is it’s solving a pretty age-old problem. People need data about what customers are doing in their different line of business tools. So, if you look at a tool like Salesforce, it’s only as useful as the information inside of it. If I’m a B-to-B SaaS company, let’s say I use a tool like Salesforce to talk to my customers. Now if I’m a sales rep at Plaid and I’m trying to look for opportunities to upsell customers on new features, sell them new bigger plans of credits and different things like that, and I have no idea from Salesforce any information except when’s the last time I contacted this customer, how much are they paying me, et cetera, that’s not very useful. But if I can extend my Salesforce [capability by means of reverse ETL], you can 10X the value of these tools.
Do we really need something like this? I mean, I can imagine that you can just go do this directly and use an existing ETL tool, or use existing APIs to go talk to your warehouse.
Why do we really need something else?
I honestly think reverse ETL is going to be one of the biggest categories people think of when they think about data warehouses. I’m obviously biased, but the reason I think is when you buy a data warehouse today, you need to present the information to your users. BI is the most common way to do so today. You show a report in Tableau, you show a report in Looker, but really, users are becoming more and more savvy, and they want more and more capabilities with this data that’s verticalized to their unit they’re in. So, if you think about it, a marketer is living in a marketing platform. A salesperson is living in a sales platform. A financial person’s living in a finance platform like SAP or NetSuite, and they need good data in there to actually operate effectively, and that’s what reverse ETL is all about.
It’s about making the data analyst team able to actually impact the true business workflows of each of these teams around the company, and that problem only grows as the scale of the company grows. If you’re an engineer, it’s okay to go modify Python’s script to change how data flows into a tool like Salesforce, but if you’re a data analyst that only knows SQL, or if you’re on the sales ops team or something like that, it’s completely unfathomable to be able to go into something like Airflow or Python and change those pipelines. You kind of need a platform that has a UI to change how the data from the warehouse is actually going into one of these systems. But I think, really, if we look at analytics projects at large, most of them I think fail due to companies not being able to figure out the right way to empower their business users with the actual information, to make the information useful them, and reverse ETL is honestly one of the newer solutions in this space other than any existing BI tools to do this.
Usually in data warehouses these days you have your raw tables and you’ve got your process tables, and then you might have your aggregates and things like that.
Do you kind of have to get your house in order a little bit ahead of time before you start using reverse ETL, or is it on a use-case-by-use-case basis?
So, I guess two things. First, things never happen in a perfect order. Things are always changing, and things are always being incrementally improved in any growing organization. That being said, what we typically see, let’s say a company buys a new data warehouse like Snowflake. The first thing they need to do is figure out how to get data into it. So, they need to adopt a solution for ETL, like a Fivetran or a Stitch. The second thing they need to do is throw up a BI tool or something so they can actually visualize the data, report on it, show it to other users, run quarries, save them, and this is when they buy something like a Looker, a Tableau, Mode, et cetera. Reverse ETL never comes before those steps, for sure. It definitely only comes after those steps, but typically, we even see another step before that, which is transformation, so adopting a tool like DBT.
Once users adopt a tool like DBT and they start building these views and models that are in their data warehouse and nowhere else, it almost becomes obvious that they need to start being able to move that into the different systems that people have around the company. Since the data warehouse becomes a unique source of truth that has information that’s really nowhere else, it’s almost a data silo in some ways, and that’s where people start getting the need for reverse ETL and start adding it right on top. We obviously have users that don’t have transformations inside of their warehouse and don’t have normalized data, and they do use more complex SQL queries in Hightouch, but what we’re finding is a majority of our users are coming to us right after they do the transformation stuff.
Will self-service reverse ETL into applications be actually more common in the future than self-service BI?
Yeah. My perspective is that self-service BI is mostly a failure, and most users won’t be able to learn how to use business intelligence tools, so they’ll use vertical systems that already exist instead of, say, vertical BI tools. So, it actually is the systems that already exist that they live out of, whether that’s sales users using Salesforce, marketing users using Marketo, et cetera, and the real connective glue between the data and those users will be a reverse ETL platform. It’s interesting. In some ways, you’re creating a new category. In some ways, you’re competing with some usage of existing categories, like BI tools.
- Reverse ETL is more for “business people,” while “normal ETL” is for data engineers.
- Governance is still important to get the most value out of processes
- Reverse ETL – moving data from the data warehouse back to your applications
Visit Catalog & Cocktails to listen to the full episode with Tejas. And check out other episodes you might have missed.