About this episode

ETL (Extract Transform and Load) was the SOP for data integration for 25+ years. A decade ago the introduction of data lakes pushed transformation to the end of the process and into tools like Snowflake, BigQuery, and Redshift.  Now the latest chatter in the data management industry is Reverse ETL. Shouldn’t we call this LTE? 

Join Tim, Juan and special guest Tejas Manohar, CEO of Hightouch for a conversation about Reverse ETL and why it matters now.

Special Guests:

Tejas Manohar

Tejas Manohar

Co-CEO, Hightouch

This episode features
  • The evolution of data integration pipelines 
  • Use cases for Reverse ETL
  • What other acronyms make you smh?
Key takeaways
  • Reverse ETL is more for “business people,” while “normal ETL” is for data engineers.
  • Governance is still important to get the most value out of processes
  • Reverse ETL – moving data from the data warehouse back to your applications

Episode Transcript

Tim Gasper:
It’s time, once again. It’s Wednesday. It’s time for Catalog & Cocktails. It’s your honest, no BS, non-salesy conversation about enterprise data management, with tasty beverages in hand. I’m Tim Gasper, long-term data nerd and product guy at Data.World and joined…

Juan Sequeda:
Hey. I am Juan Sequeda. I’m the principal scientist here at Data.World, and as always, it is a pleasure to speak the middle of the week, or the end of the Wednesday, and just chat about data. Today, as always, we have some awesome, awesome, great topics and guests, and today we have Tejas Manohar, who is the co-CEO of Hightouch, which is the company which is really pioneering this new area of reverse ETL, which is going to be our topic today. Tejas, how are you doing?

Tejas Manohar:
Hey, doing well. I’m tuning in from New York City right now. I’m usually in San Francisco, but I’m over on the East Coast for a week drinking my fifth or so cup of coffee of the day, probably, just to enjoy this show.

Juan Sequeda:
Awesome.

Tim Gasper:
Awesome.

Juan Sequeda:
Thanks for joining us. So, drives in directly to our Tell & Toast segment. So, what are we drinking, and what are we toasting for? So, you’re drinking your nth coffee of the day. How about you, Tim?

Tim Gasper:
I am drinking a maple rye old fashioned. I’m testing out some rye. I don’t usually make rye cocktails. I usually do bourbon and scotch, so I’m trying out some rye, and it’s pretty tasty. I’m going to actually cheers to the beginning of the NFL season. As some of you may know who have been listening, I am a Cleveland Browns fan, and my team is actually good this year, so I look forward to that.

Juan Sequeda:
Well, I’m actually… Let me start with my toasting. I am in Europe right now. I’m in Amsterdam, and I’m toasting for actually being able to go start traveling and go visiting customers and conferences, hybrid. This is fantastic to be back. It’s still a little weird in this era of COVID, but that’s going to be the new thing, and I told the hotel bar what I was up to. I want to get a shout out to Jill from our hotel at Hotel [inaudible 00:02:06] here in Amsterdam, and she prepared, first of all… This is a cocktail called Porn Star. It is a passion fruit liqueur, vanilla vodka, lime juice and egg white, but she said I should have a second drink too, and it has to be very, very Dutch, and this is what she called the Jenever Ale, and it’s Jenever, which is a special Dutch drink out of juniper berries and ginger ale. So, I got actually two drinks here, so this will be a fun episode.

Tim Gasper:
Thank you so much for joining from Europe, and it’s probably not even Wednesday anymore for you, right?

Juan Sequeda:
No, it’s still Wednesday.

Tim Gasper:
It’s probably creeping into Thursday, right?

Juan Sequeda:
It’s 11:00 p.m., and I’m literally ending my day with this, so this is a great [crosstalk 00:02:45].

Tim Gasper:
Nice. Well, very excited to have you here, Tejas. This is going to be a great conversation, and glad we can bring all sorts of geographies here today.

Tejas Manohar:
Yeah, around the whole world and back.

Juan Sequeda:
So, our warmup question today is what acronyms make you SMH, shake my head? That’s what it means, right?

Tejas Manohar:
Yeah, fair enough. So, I actually like reverse ETL. I think a lot of people don’t, so I’ll throw another acronym under bus, CDP. CDP is one that I don’t like, customer data platform, probably because it makes it hard to make anything in this space. I think everything’s a customer data platform. What isn’t these days, if you really think about it?

Tim Gasper:
That’s a good answer.

Juan Sequeda:
That’s what we always start off with. The question is, hey, what is a customer? Customers are all over the place. How about you, Tim?

Tim Gasper:
A serious answer is MDM is kind of a confusing one, master data management-

Tejas Manohar:
Yeah, that’s a-

Tim Gasper:
… metadata management. Especially master data management, I’m always kind of weirded out by. Master data? Interesting. A goofy one, though, is ADIDAS, all day I dream about soccer. Is that actually what it means, or is that fake? I don’t know.

Tejas Manohar:
Wait, is that real?

Tim Gasper:
Was a told a story when I was younger? What’s that?

Tejas Manohar:
I didn’t know that it actually meant something. Interesting. I had no idea.

Juan Sequeda:
Neither did I.

Tim Gasper:
Yeah, I’m not sure if that’s a real thing or not, but maybe it is.

Juan Sequeda:
That’s a fault with mine. Mine is TIL, today I learned. Today I learned that ADIDAS means all day I dream about soccer.

Tejas Manohar:
Same.

Juan Sequeda:
But the other one is ROFL, ROFL, because it should be ROTFL, rolling on the floor laughing, but anyway, with that-

Tim Gasper:
That’s got to be one of our weirder segments.

Juan Sequeda:
All right. Let’s get serious. Let’s get serious. Okay. So, honest, no BS, Tejas, so here’s actually super interesting that in the last six hours I’ve had three different conversations where reverse ETL has come up.

Tejas Manohar:
Wow.

Juan Sequeda:
All of them have been, “WTF? What is reverse ETL?” So, honest, no BS, what the heck is reverse ETL? Because we all know that ETL stands for extract, transform, and load. Shouldn’t it be LTE or whatever? But anyways, honest, no BS, what is reverse ETL?

Tejas Manohar:
Yeah. So, reverse ETL, honestly, with terms, you just got to follow the one that’s sticking, and what reverse ETL really is is a specialized form of ETL. It’s about moving data from the warehouse into different operational systems around the company. So, imagine you have all your data in your warehouse, and use a typical ETL provider like a Fivetran to get it in there. We do the opposite. So, you can take the data from the data warehouse or data lake or anything that can run SQL, and move it into different operational systems. It’s the reverse of a typical ETL process. You’re probably wondering, is that just ETL? Yes, it is just ETL.

Juan Sequeda:
Yeah. That’s the next question I have right now.

Tejas Manohar:
It is definitely just the ETL in a lot of ways, but the interface is just so much different than the data replication tools that you see in the market today, like a Fivetran or Stitch or anything like that, that are really focused on just getting data into the warehouse, can be there in any format as long as it’s getting there reliably. You can query it in SQL, and that’s all you really care about, does the user have something like one of those services, where the reverse ETL is a very different interface to use it. Customers need a lot of control on how the data appears in a system like Salesforce or Facebook Ads or Google Ads. Those systems can look completely different than each other, whereas a database or data warehouse is always the same.

Tejas Manohar:
Then a lot of different personas can come into your reverse ETL platform to actually configure this thing So, sales ops can manage how the fields are going in Salesforce. A marketing ops person can manage how those fields are going into the ad tool. A financial person can manage how the fields are going to SAP or NetSuite. So, reverse ETL, it’s really just opening up ETL and saying… As a data warehouse becomes a source of truth for data around the company, everyone actually wants to ETL stuff into the tools they actually use, and reverse ELT platforms like Hightouch are just trying to make all those capabilities super simple and accessible without coding.

Tim Gasper:
So, there’s a different technology paradigm. There’s a different persona paradigm. You started to get a little bit into why people want to do reverse ETL. What is the business value there, and can you give maybe an example, a popular example of maybe something that people try to do with reverse ETL?

Tejas Manohar:
Yeah, for sure. So, why reverse ETL? At a high level, the idea is it’s solving a pretty age-old problem. People need data about what customers are doing in their different line of business tools. So, if you look at a tool like Salesforce, it’s only as useful as the information inside of it. If I’m a B-to-B SaaS company like, let’s say, Plaid, one of our customers, if I’m Plaid and I use a tool like Salesforce to talk to my customers, now if I’m a sales rep at Plaid and I’m trying to look for opportunities to upsell customers on new features, sell them new bigger plans of credits and different things like that, and I have no idea from Salesforce any information except when’s the last time I contacted this customer, how much are they paying me, et cetera, that’s not very useful. But if I can extend my Salesforce [inaudible 00:08:00] Salesforce, so you can 10X the value of these tools. Yeah. It’s a super common use case for B-to-B companies.

Juan Sequeda:
So, what are the things that I, I’ll be honest, I complain a lot, especially on this show, is that there is all these amount of different boxes. I mean, you go through… Again, I talked about this, the Andreessen Horowitz architectures, the Bessemer Ventures, the modern data stack, and they have 15, 20 different boxes of it, and reverse ETL, I was looking at it today, is data loading, and they call it ingests, and they call it reverse, I think, what they do.

Juan Sequeda:
So, we have so much stuff out there, and what really scares me is that we’re going to have to go… People are thinking about, “I need to go buy 15 things, and we’re going to have to go figure out how to go connect all these 15 things together.” Do we really need something like this? I mean, I can imagine that you can just go do this directly and use an existing ETL tool, or use existing APIs to go talk to your warehouse. Why do we really need something else? But honestly, I just feel that we’re just adding more stuff into the mix, and this is not going to go into the right direction.

Tejas Manohar:
Yeah, yeah. I mean, really, I honestly think reverse ETL is going to be one of the biggest categories people think of when they think about data warehouses. I’m obviously biased, but the reason I think this is when you buy a data warehouse today, you need to present the information to your users. BI is the most common way to do so today. You show a report in Tableau, you show a report in Looker, but really, users are becoming more and more savvy, and they want more and more capabilities with this data that’s verticalized to their unit they’re in. So, if you think about it, a marketer is living in a marketing platform. A salesperson is living in a sales platform. A financial person’s living in a finance platform like SAP or NetSuite, and they need good data in there to actually operate effectively, and that’s what reverse ETL is all about.

Tejas Manohar:
It’s about making the data analyst team able to actually impact the true business workflows of each of these teams around the company, and that problem only grows as the scale of the company grows. You can do it in house writing your own scripts, writing your own APIs, but then you really lose two things. One, a bunch of time because it’s tons of data engineering work to go figure out the APIs of all these foreign systems, how to plug into them, all that stuff, and then two, you just can’t democratize different parts of the organization to be able to modify those scripts.

Tejas Manohar:
If you’re an engineer, it’s okay to go modify Python’s script to change how data flows into a tool like Salesforce, but if you’re a data analyst that only knows SQL, or if you’re on the sales ops team or something like that, it’s completely unfathomable to be able to go into something like Airflow or Python and change those pipelines. You kind of need a platform that has a UI to change how the data from the warehouse is actually going into one of these systems. But I think, really, if we look at analytics projects at large, most of them I think fail due to companies not being able to figure out the right way to empower their business users with the actual information, to make the information useful them, and reverse ETL is honestly one of the newer solutions in this space other than any existing BI tools to do this.

Tim Gasper:
Yeah. We’ve got an interesting question here in our chat that is extending a little bit on what some of what you were talking about is here. There’s this longstanding problem with data warehouses around lack of standards for moving data into the repository, and Jeff mentions he’s personally skeptical that trying to do so when extracting data from the warehouse to ancillary systems is going to offer economies of scale or justifiable benefits. Why not standardize closer to the source? Do you have any thoughts on that? Does the fact that there’s a lack of standardization, and data warehouses can be pretty messy, effect the value of tools like reverse ETL tools, or does the reverse ETL paradigm take this into account pretty well?

Tejas Manohar:
Yeah. So, from what we see, basically everyone across cloud data warehousing, any companies in cloud data warehousing is starting to invest into an ELT architecture where they’re building transformations with a solution like DBT inside their warehouse to create those normalized views of what a user is, how much revenue they have, resolving the entities of different accounts that they’re serving, different things like this, and reverse ETL just benefits from that. So, we don’t solve that problem in a whole new way. We do one specific thing, which is helping companies take that data that they’ve already normalized using solutions like DBT, and move it to the various different downstream systems.

Tejas Manohar:
So, Jeff asked why you can’t do this closer to the source system. So, the reason you can’t do some of this plumbing within systems like Salesforce and stuff like that is, honestly, you don’t have… With the source systems, you just don’t have the capability to join all these various different datasets together, because one, you don’t have them in there. So, if you look at a system like Salesforce, it’s only as valuable as the information that’s actually in the system, and the truth of the matter is across the board, the only place where companies have all their information is becoming data warehouses and data lakes, and that’s a really convenient place since people have the power of SQL to join the data together and build their own normalized views however they think this information should be represented.

Juan Sequeda:
So, I’ll be very honest. I’m taking this position that I find this reverse ETL thing, with all due respect, a lot of BS. I’m an honest guy. That’s why we do this. But let me actually go flip it around and actually give you the benefit of the doubt about this. Something that’s coming to mind is… Let’s talk about personas. An ETL tool, traditionally, like old-school Informatica, and now Fivetran, all this stuff, those are more tools for personas, of technical personas [crosstalk 00:13:41].

Tejas Manohar:
Yeah.

Juan Sequeda:
But then you are seeing is like, “Heck, I work in the marketing department. I work in the finance department. I just need to get my work done, and it is so frustrating that I had to go work with this IT, the data warehouse, data engineers. Make my life easier, please.” I’m more of a businessperson data analyst in my domain of marketing or finance, and I’m probably enough technically-savvy to know basics of SQL. Reverse ETL, or this mindset, is more for the persona of more of the data analyst that says, “Give me enough, or let me go use enough of my SQL savviness to go do stuff, and then I can now push the data to the application that I need to go solve my problem.” So, all of this to go say the traditional ETL is for the data engineers, more technical folks, while this reverse ETL is going to be more for, let’s call it, business-focused or business types of personas. That’s what’s going through my head right now. What do you think about that? Am I going in the right direction, or [crosstalk 00:14:45]?

Tejas Manohar:
Yeah. It’s not wrong. Basically, what reverse ETL is actually allowing is the business teams to make use of the data in the data warehouse in their actual workflows. I think a lot of companies are like, “Oh, if you want to use data, you need to learn SQL or you need to look at our reports to access them,” but the truth of the matter is most of the work of these teams gets done in their own vertical tool for their unit, not a tool like Tableau or Looker, and reverse ETL is about allowing them to take anything in those reporting tools and make it show up in their tool so that they can utilize it for marketing campaigns or in their sales processes, et cetera.

Tejas Manohar:
Yeah. I would say it’s not like we shy away from the data team. Data teams are often tasked to solve this problem. They’re often tasked, “Hey, can you get this metric in Salesforce? Hey, can you get this in Marketo? Hey, can you sync the SQL query into Facebook as a CSV so I can target these users?” We help data teams solve those tasks all the time, but we also have an interface that’s easy enough that if someone’s written a SQL query or created a Looker report, a business user can come in and just make that data available in the systems they actually use, and Hightouch will just continue syncing it on a schedule.

Tim Gasper:
So, imagine I’m a marketer and I have a reverse ETL tool in my company, and I’ve got a data warehouse. I know that companies like Tableau and Looker, they have some of these functionalities that are like set up an alert, or set up a… You can do a little bit of action-based stuff out of those platforms. But is this new paradigm one in which the marketer might actually go and look at the data warehouse, or hey, maybe they’re using a data catalog or something like that, and go look and say, “Hey, actually, I want to key some data on this into my Marketo or my,” whatever tool it is, and that they might actually go from seeing a piece of data to say, “Oh, wow. If I could get that in here, then I would do an automated workflow on marketing campaigns that would use this as an attribute,” or something like that. They could go from finding it all the way to using it in one process themselves, like a self-service approach?

Tejas Manohar:
Yeah. So, the idea is that they can go all the way from finding it to actually having it show up in their systems. If they find that data inside of a Looker BI report, they can select that report, and they can select the column from it they want. They can get that into any system like Salesforce, Marketo, Facebook, whatever it is. If they have a preexisting query that someone’s built, they can filter down the query to take a subset of the results by some certain criteria instead of the Hightouch platform in what we call a segmentation engine. So, they can segment the results in any way they want, and then sync those segments of results into different systems. The idea is to actually make the data in the warehouse and the analysis effort that people are doing in BI tools way more useful by allowing different users across the business to actually be able to use it in their day-to-day processes, whether it’s running marketing campaigns or selling to customers, looking for prospects, et cetera.

Juan Sequeda:
So, we have a comment here from a LinkedIn user. Moving data all over the place is not a bad thing if it’s governed and a golden source is respected. That’s a fair point. I mean, I think the whole issue… Your goal is we do all this work, it lands into the data warehouse like Snowflake, you’ve done your DBT stuff there, so it’s mine. Right? I should be able to go push all that data wherever an application needs to go. But that application is going to consume that data from the warehouse, but there is some logic in that application, the way it’s perceived. So, somebody needs to go write that logic, which is in SQL. They’re going to have to go write it for the Salesforce app, write it then for the Marketo app, and so forth.

Juan Sequeda:
But at some point, you’re going to have more logic that you just say, “Well, shouldn’t that logic be then pushed down to the data warehouse, or keep track of that logic?” and then you realize, “Wait, wait. I need to now start cataloging all these different applications, and then this reverse ETL is something I need to go catalog itself,” because that’s lineage, because somebody’s going to go to the Salesforce app and say, “Hey, that number, what is that number? I don’t trust that number. We need to go trace it back to the source,” and it goes back to the warehouse, and then we need to know where their lineage goes all the way back. Okay. I’m just ranting here, but it seems to me that this is a… I mean, I’m seeing it as a problem, but at the same time, it’s like, what else do we have to go do?

Juan Sequeda:
We probably have to go live with the situation, and all we really need to go do is let’s make sure that we get the best out of it. Let’s make sure that we have the best governed data warehouse of the data around that we know where that comes from. If we’re going to go do reverse ETL, basically, go and push data from the warehouse to the applications, let’s make sure that we’re governing that very well, we know what these mappings are, and we know who did them, when was that done. I mean, that’s really important. It’s the whole governance space of this. Again, I’m just here talking out loud about it. I guess [crosstalk 00:19:47]-

Tejas Manohar:
Totally. Yeah. I think, basically-

Juan Sequeda:
… governance and reverse ETL. These things need to go hand-in-hand to make sure that you’re doing something successful. Otherwise, it’s going to be the wild, wild West.

Tejas Manohar:
Yeah. I totally agree. I think Jeff just pointed out in the comments again, there’s no problem in moving data. It’s really moving data that hasn’t been standardized, that hasn’t been normalized, hasn’t been documented and cataloged, and those are all things that reverse ETL alone can’t solve. It has to be solved in conjunction with a modern data stack. That’s where metadata solutions come in. That’s where DBT and different transformation solutions come in, and-

Tim Gasper:
Is there kind of… Oh, sorry. Go ahead.

Tejas Manohar:
No, no, no. [inaudible 00:20:24].

Tim Gasper:
I was going to say, is there sort of an order that you need to go about these things? Do you think you need to get your… Usually in data warehouses these days you have your raw tables and you’ve got your process tables, and then you might have your aggregates and things like that. Do you kind of have to get your house in order a little bit ahead of time before you start using reverse ETL, or is it on a use-case-by-use-case basis? As long as that one use case is mature and using well-governed data, then you’re in a good shape for that use case? How do you think about that?

Tejas Manohar:
Yeah. So, I guess two things. First, things never happen in a perfect order. Right?

Tim Gasper:
Right.

Tejas Manohar:
Sometimes the sales data might be at a certain maturity, and this new product data might not be because your company just acquired another company. So, things are never perfect. Things are always changing, and things are always being incrementally improved in any growing organization. That being said, what we typically see, let’s say a company buys a new data warehouse like Snowflake. The first thing they need to do is figure out how to get data into it. So, they need to adopt a solution for ETL, like a Fivetran or a Stitch. The second thing they need to do is throw up a BI tool or something so they can actually visualize the data, report on it, show it to other users, run quarries, save them, and this is when they buy something like a Looker, a Tableau, Mode, et cetera. Reverse ETL, it never comes before those steps, for sure. It definitely only comes after those steps, but typically, we even see another step before that, which is transformation, so adopting a tool like DBT.

Tejas Manohar:
Once users adopt a tool like DBT and they start building these views and models that are in their data warehouse and nowhere else, it almost becomes obvious that they need to start being able to move that into the different systems that people have around the company. Since the data warehouse becomes a unique source of truth that has information that’s really nowhere else, it’s almost a data silo in some ways, and that’s where people start getting the need for reverse ETL and start adding it right on top. We obviously have users that don’t have transformations inside of their warehouse and don’t have normalized data, and they do use more complex SQL quarries in Hightouch, but what we’re finding is a majority of our users are coming to us right after they do the transformation stuff.

Juan Sequeda:
So, part of the transformations and getting all this data right is about modeling, and that’s actually-

Tejas Manohar:
Totally.

Juan Sequeda:
… how we got connected, right, you and I?

Tejas Manohar:
Yeah.

Juan Sequeda:
We were just on Twitter. We had never connected before. I saw your comment on Twitter about it, and it was all about if you get the modeling right into the data warehouse, then life is going to be easier, and one of the things that people know that I talk about this all the time is that we don’t think about modeling a lot. All right? I’m really frustrated that you can become a data scientist doing all this work, and nobody takes even courses or learns how to do data modeling. What are your recommendations, or how do you present this to your folks and customers and prospects and stuff, of understanding that modeling is such a very important thing, that if you don’t get the models right, then you’re just going to put more weird garbage and spaghetti code in, and so forth?

Tejas Manohar:
Yeah. I mean, the way I look at it is… I think this was the topic of tweet, but it’s basically, the goal of every tool in the data stack is to make you only focus on data modeling as a company. The reason I say this is because I think there’s a bunch of nonunique work in data engineering that people often get caught away with. Pulling data in from different sources into the warehouse, visualizing data and sharing reports around the company, pushing data out to different systems, all of these problems are problems that every single business faces. Data modeling is the one thing that no tool is going to be able to generically solve. There’s no tool that’s going to be able to say, “We do data modeling for your business or any business.”

Tejas Manohar:
It’s a very unique problem, and it requires a human to actually look at it very differently for each business because there’s a bunch of semantics that comes into play. So, I think the goal for any data engineering team or data analyst team that’s on a modern stack should be to use SaaS tools in a way so that they can avoid thinking about any problem except data modeling, and they can just focus all their energy on making the best data models inside of their data warehouse for their business, and that’s what it really comes down to.

Juan Sequeda:
So, I’ve asked this before. Out of all the modern data stack tools, companies, everything, there is no modern data modeling tool out there?

Tejas Manohar:
Well, I think a transformation tool is basically-

Juan Sequeda:
No, but a transformation tool is a mapping. I mean, that’s moving from one model, which is your source, to a model which is the target. I mean, how do you define the target schema, the SQL DDL that’s going to go in there? I mean, all the UML tools to go do that, if we’re thinking about relational. I’m not going to be salesy here, but I could be, but there’s stuff that we’ve done too. I don’t want to… I’m just bringing that out, but I just think that there’s a lack of modeling tools, and I think this is just a cultural stuff. We just focus on data, data, data, while modeling is more about the knowledge, about… You got to go talk to people about this stuff, and I think that’s the part that we’re missing. There should be a modern data modeling tool, and I think that’s the one that we need to go fund. I want to go see more companies doing that. So, people listening out there, [crosstalk 00:25:33]-

Tejas Manohar:
What do you think that would look like? What do you think that would look like in comparison to, say, a tool like DBT?

Juan Sequeda:
I would literally want a tool that is… Imagine a Google Docs combined with a dragging and dropping, bubbles and lines, and go do that, and that generates your schema, and it’s something that you should be able to go share with people, it’s realtime, write comments. That’s the modeling approach that we need to go do. We don’t have any of that stuff. How do you define your SQL DBL, your schemas for your warehouse? You write code for it. That’s not modern.

Tim Gasper:
I think architects for this kind of stuff have fallen out of the mainstream here, and I think data engineers and data scientists and data analysts now have become kind of on the front lines of designing these types of models, and in many cases, they’re not really thinking about models or knowledge-first type approach. It’s kind of like DBT, I agree with you, Tejas, DBT is maybe the closest thing in the modern data stack to modeling today-

Tejas Manohar:
Totally.

Tim Gasper:
… but it is more like it’s efficiency-focused. It’s, I got to get from point A to point B. Here’s the transformation that gets me there, but what is the business definition it ties to? What is the greater model for the business that it ties to? It just isn’t a big focus today, I feel like, for most companies.

Tejas Manohar:
I’m not sure that’s wrong, and it actually feels like solutions like DBT allow you to incrementally chip away at problems. I think a lot of people overemphasize on data modeling best practices or philosophies or things like that, when I feel like the most important thing is just to make it easier and earlier to query the data and make it make more and more sense, and that’s the mindset that a lot of analysts or analytics engineers are taking with tools like DBT today. I have SQL queries that require [inaudible 00:27:23] today. These are the attributes I’m always looking at.

Tim Gasper:
Yeah.

Tejas Manohar:
Let me pull those up and make a view for it.

Tim Gasper:
Maybe that’s the future. Maybe it’s building on DBT. Maybe, Juan, the future of modeling is to take something like DBT and integrate it with the catalog, integrate it with the graph. I don’t know.

Juan Sequeda:
I think we need to have another topic on just pure modeling. I think that’s going to be a good one after that, I mean after this discussion [inaudible 00:27:50], but talking about consolidation, you just mentioned something. I’m very curious. I mean, the stuff that your company is doing, and there’s other companies who are doing it, which my other question-

Tejas Manohar:
For sure.

Juan Sequeda:
… what differentiates all companies? I mean, again, don’t take this the wrong way, but it’s honest, no BS. This seems to be like just a nice very API wrapper layer around the SQL queries. That’s how I define this. So, what’s the differentiator between different companies who are competing in this space, in this category, which you’re arguing that is going to be very big category, which I am extremely skeptical about that? That’s one thing, and second, I could imagine that any existing company out… I mean, a Snowflake, a DBT, they’re going to go do it too. So, what differentiates the companies in this category, and what makes you really think that this category is actually going to exist, it’s not just going to be eaten up by a data warehouse or existing integration tools?

Tejas Manohar:
Yeah. So, I think if I was a data warehouse or a DBT or something like that, this isn’t the first category I would get into. I think if I was a DBT, I’d probably consider things a bit closer to lineage or [crosstalk 00:29:08] or something like that, or Snowflake. I’m thinking, how do I get platform value out of this, and how do I keep my edge over the clouds that are [inaudible 00:29:17], for example? So, I think for us, what we really see as our long-term differentiation is the way we build integrations and our philosophy of that, which is treat every single one of them as a product, where we’re trying to appease the user of that platform. So, our market integration should be usable by people who love Marketo. Salesforce integration should be [inaudible 00:29:39] by the Salesforce advocates. Facebook Ads should be usable by people who live and breathe in Facebook Ads, and that’s a very complex problem, and a mode that we’re building with just a lot of deep integrations that don’t all look and feel the same.

Tejas Manohar:
I think really, what we see in the competition and the other players, their [inaudible 00:29:56] space is… They seem to have a very generic view of the world, pushing a field from here to here. When you’re pushing a field to a Facebook audience list, it looks like you have to map over the list ID as a field instead of if you actually use Facebook Ads, you know that you want the product to automatically go create a list in Facebook and sync data to it because that’s the workflow that marketers actually have when it comes to Facebook ads. So, I think really, what the big differentiator is going to be is going to be really understanding all of these tools, the users behind them, and building a product that abstracts over all this so that the users of each tool in each department can actually use the platform to take data from the warehouse to solve the different business problems across the stack.

Tejas Manohar:
The other thing we focus on is at the same time just building a primitive foundational platform that engineering teams want at the core of their data stack. So, that means really good visibility, a defogger, version control integration so you can actually manage this stuff when you have tons and tons of syncs, alerting and all of the above that you would expect if you were to build the best reverse ETL platform in house.

Juan Sequeda:
Yeah. I can see that that’s a way of differentiating two different companies, is what’s the user that you’re really focusing on, and what’s the type of problems and use cases that… But Tim, going to pass it on to you.

Tim Gasper:
Yeah. So, we’re getting to our lightning round now, so I think it would be good for us to go ahead and talk about some quick questions. It’ll be a yes-or-no answer, and if you want to add a sentence or two for additional context, feel free, but we’ll see if we can squeeze in a little extra here, some interesting-

Tejas Manohar:
Let’s do it.

Tim Gasper:
… and fun topics. So, we actually got four here, so a couple came up during the conversation here, which is going to be fun. So, first one, Juan, why don’t you start us off?

Juan Sequeda:
Yeah. So, will ETL and reverse ETL merge in the future?

Tejas Manohar:
So, my perspective is no. The reason for that, I think reverse ETL just has fundamentally way different personas than ETL. You can see this in our product. Last week, we just launched [inaudible 00:32:01] product audiences. We realized the common workflow marketers need, when syncing data to different tools, is they need to build different audiences of that dataset using an audience builder or a segmentation feature so you get the users who fit this criteria or that criteria, and it looks like a marketing UI, but it all runs in the data warehouse and allows you to easily sync the data to different tools after you’ve done that step. I think, really, what we’re finding is reverse ETL, every integration needs to think of it as a product, and that’s our differentiator between the ETL tool, which just gets the data in, and it’s good enough.

Tim Gasper:
I love that response because it goes back to what we kind of started this off with, which was you were mentioning that reverse ETL kind of… If it’s the term that takes off, then that’s great, but it kind of interesting how it’s really not just reverse ETL. It’s actually a very different category. That’s interesting. So, next question here, will there be more no-code data engineers in the future than coding data engineers in the future?

Tejas Manohar:
Totally. I think my perspective is that companies shouldn’t be focusing on writing all these integrations and doing all these things that require custom code. They should really, really, really be focused on data modeling, and the rest of problems, whether it’s [inaudible 00:33:14] streaming systems, moving data into the warehouse, moving data out of the warehouse, making reports, putting stuff in your app, should really be solved by the SaaS platforms. So, my perspective is a lot of the data engineering plumbing work as you see it today will be transitioned into really data modeling work, and making data represented by your business.

Juan Sequeda:
I love that. I can’t wait for me to be able to do a little bit more data engineering, because I’ll say my Python’s not great.

Tejas Manohar:
Fair enough.

Tim Gasper:
Yes. All right. So, is ETL plus warehouse plus reverse ETL the new master data management?

Tejas Manohar:
Yes. That’s actually correct, I think. ETL plus warehouse plus DBT plus reverse ETL is [crosstalk 00:33:58] management-

Juan Sequeda:
Oh, okay. Okay, good. [crosstalk 00:34:01]-

Tejas Manohar:
… and of course, CDP. Yeah.

Juan Sequeda:
Sorry, and what?

Tejas Manohar:
Or CDP. I think MDM and CDP are both equally confusing terms. Basically, what they both do, move a bunch of data into one place, transform it into some sort of model, move it out to other places [crosstalk 00:34:14] spreading it.

Juan Sequeda:
So, technically, ETL, when you say the DBT is the T of ETL, so it’s in there, so I think we’ll say the ETL includes [crosstalk 00:34:22]-

Tejas Manohar:
Fair enough.

Juan Sequeda:
… warehouse or reverse ETL is the new MDM slash CDP. Wow. There’s a lot of TLAs right there. Tim, [crosstalk 00:34:30]-

Tim Gasper:
That was a lot of acronyms. All right. That’s interesting. I love that. Okay. Last lightning round question here, will self-service reverse ETL into applications be actually more common in the future than self-service BI?

Tejas Manohar:
Yeah. My perspective is that self-service BI is mostly a failure, and most applications will not… Most users, just across the company, won’t be able to learn how to use business intelligence tools, so they’ll use vertical systems that already exist instead of, say, vertical BI tools. So, it actually is that systems that already exist that they live out of, whether that’s sales users using Salesforce, marketing users using Marketo, et cetera, and the real connective glue between the data and those users will be a reverse ETL platform. It’s interesting. In some ways, you’re creating a new category. In some ways, you’re competing with some usage of existing categories, like BI tools.

Tim Gasper:
This is one of the more compelling aspects of the future that I’m excited about, that this conversation brings up for me, because I think so many people think of, “Oh, let’s put the power of machine learning in the hands of people, oh, here’s your data robot,” or “Hey, let’s put the power of visualization in the hands of everybody.” Here’s your Looker. Here’s your tableau. But in the end, most people are spending their time in Salesforce and Marketo, and all these different domain-specific tools, and I see the data mesh pushing more towards the domain. Very interesting things are afoot in the data space.

Tejas Manohar:
Exactly. Yeah. I think in a lot of ways, that’s why I actually think reverse ETL is going to be up there with BI in terms of use case of warehouse. Really, in the end of the day, people are looking for better ways to use the dat and use the definitions that they’re creating, and a reverse ETL is really one of the only new solutions after decades of just being stuck with good old BI with new flavors.

Juan Sequeda:
All right. Well, TTT time. Tim, take us away with your takeaways.

Tim Gasper:
All right. Let’s do some takeaways. So, first of all, I appreciated your definition around reverse ETL. It’s about moving the data from the data warehouse back to your applications. Pretty straightforward, and I wrote a little bit about when we talked about the why. You’re trying to take action on it and have better context in the applications, specifically within those different domain applications. If you’re a salesperson, a finance person, a marketing person, an HR, person, you have the apps that you spend most of your time in. I thought that was a great way to think about very simply what reverse ETL is.

Tim Gasper:
Then I liked your response to the lightning round question around ETL, warehouse, DBT, good addition, reverse ETL, this is sort of the new MDM, the new CDP. I think that’s interesting because I’ve always been very not so skeptical of CBP because I see what it’s trying to do, but I’ve been very skeptical of MDM of not heard of enough success cases in the world of master data management versus the amount of-

Tejas Manohar:
Totally.

Tim Gasper:
… money actually being invested in master data management, and so I’m excited for what the modern data stack might bring in this area.

Juan Sequeda:
It reminds me of an old episode we have on MDM, one of our early episodes where I asked you, Tim-

Tim Gasper:
Good episode, yeah.

Juan Sequeda:
… “What is MDM?” and you’re like, “That’s fancy data integration.”

Tim Gasper:
Fancy data integration.

Juan Sequeda:
Yeah.

Tim Gasper:
[crosstalk 00:37:44] what are your big takeaways?

Juan Sequeda:
So, I’m skeptical about this whole reverse ETL, but one thing that really got [crosstalk 00:37:52]-

Tim Gasper:
Are you skeptical? Are you sure you’re skeptical?

Juan Sequeda:
One thing that got me to him is like, hey, something that makes sense is the normal, traditional ETL is for more technical data engineers, and you can see this reverse ETL being more connected, understanding what are the personas, the users, and their specific business problems, so you’re really addressing connecting more to those business users, and you can be able to go create these tools [inaudible 00:38:17] business users. So, I think that’s a big aha moment for me. I really appreciate getting that out with this conversation.

Juan Sequeda:
Second one is this, kind of not surprising, but the realization that governance is still so important to make sure we get the most value out of the whole process, that governance and this reverse ETL need to go hand-in-hand, and part of that also is the modeling. I think this is something that we need to go have more of the modeling, and we need to have modern data modeling tools. Following up on Tim’s thing, the ETL warehouse, what I love is all these TLAs, ETL, DBT, plus EW, plus reverse ETA equals MDM or CDP. [crosstalk 00:39:03]. Yeah. The ETL [inaudible 00:39:06]. Okay. So, Tejas, back to you. Final two questions. One, what’s your advice about data, about life, whatever, and second who should we invite next?

Tejas Manohar:
My advice is to try to write as little code as possible. Use SaaS services, and my who should we invite next… Have you had Materialize on the show, Arjun from Materialize? I think he would be great to talk about the future of data warehouses streaming SQL.

Tim Gasper:
Isn’t Frank McSherry part of-

Tejas Manohar:
Yeah, yeah. That’s the co-founder,

Tim Gasper:
Yeah. Okay. You also recommended who?

Tejas Manohar:
Arjun. He’s the other co-founder of Materialize.

Tim Gasper:
Okay.

Tejas Manohar:
Yeah.

Tim Gasper:
All right.

Tejas Manohar:
But Materialize, I can see [inaudible 00:39:52].

Juan Sequeda:
Yeah. I met Frank once at a conference at a database. He’s an academic, so we met at the [inaudible 00:39:59] database conference, so yeah, that’s a great one.

Tim Gasper:
[crosstalk 00:40:02], right? [inaudible 00:40:03].

Tejas Manohar:
Perfect.

Juan Sequeda:
Tejas, thank you so much. This was a fantastic conversation. I came in very, very skeptical. I’m leaving skeptical, but with some really good insights about this, and there is [crosstalk 00:40:17]-

Tejas Manohar:
I always appreciate a good skeptic.

Juan Sequeda:
Well, actually… So, here’s the funny thing, is that before earlier today, we were talking with Eric [Bernhardenson 00:40:26]. I know that you guys are meeting soon.

Tejas Manohar:
Yeah, I’m meeting him tomorrow.

Juan Sequeda:
Because he’s going to be on our show, I think… I’m looking it up right now. On November 3rd, and he’s like-

Tejas Manohar:
Great.

Juan Sequeda:
… “Two weeks ago, I had no idea what a reverse ETL was,” so that’s where we had an earlier discussion with him. So, this is really cool, that we had it, and hopefully he’s listening or he’ll listen to this later, and I think this is just more conversations on these topics we need to have, so thank you so much. Appreciate it.

Tejas Manohar:
Awesome. Thanks, guys.

Juan Sequeda:
Have a great one.

Tim Gasper:
This has been great.

Tejas Manohar:
See you.

Tim Gasper:
Cheer, Tejas.

Enter Content Here.