Modern Data Stack: Technology, Methodology, or both? w/ Nick Schrock

About this episode

The modern data stack is often defined by the type of technologies that exist within it. Cloud-based, open source, low/no code tools, ELT, and reverse ETL. But surely there’s more to it… isn’t there?

What holds the modern data stack together and makes it the architecture of choice for so many data-driven enterprises? Join Tim, Juan and special guest, Nick Schrock, founder of Elementl and creator of Dagster and GraphQL, to chat about all things MDS. 

Special Guests:

Nick Schrock

Nick Schrock

Founder, Elementl

This episode features
  • Is modern data stack a methodology or a set of disparate cloud technologies?
  •  Thoughts on consolidation among MDS tools
  • Describe your reaction upon glancing at Matt Turck’s latest data landscape diagram
Key takeaways
  • Assembling MDS tech into a platform.
  • All you need is Simple MDS: cloud data warehouse, dbt, ingest, analytics.
  • Work together for operational robustness

Episode Transcript

Tim Gasper:
Hello, everyone we’re live. It’s time for Catalog and Cocktails. It’s your honest, no BS, non-salesy conversation about enterprise data management. I’m Tim Gasper, product guy and data nerd. Joined by Juan Cicada.

Juan Sequeda:
Hey, Tim. I’m Juan Sequeda. I’m the principal scientist here at Data.world, and it’s Wednesday, middle of the week, end of the day, close to end of the day, and always a pleasure. I am so lucky that we get to have our pause and chat about data. I’m lucky that Data.world lets us keep doing this as part of our day job, so thank you Data.world. And-

Tim:
Data and drinks.

Juan:
Data and drinks. And today we have a guest, a very special guest who has, I would say, impacted the lives of all developers nowadays because some of the stuff that he has created, I hear it probably multiple times every day, and that is Nick Schrock, who’s the founder of Elemental, creator of Dagster, and also GraphQL. Nick, it’s a pleasure to have you here. Thank you so much for joining us.

Nick Schrock:
Thanks for having me. Thanks for the kind words.

Juan:
Yeah. So, let’s get into some of the importance stuff first. What are we drinking? What are we toasting for?

Nick:
So, I’m a simple man and I have a beer. It is a Melvin Juicy Theorem IPA. That’s not sponsorship. That’s just a passion product. No, I’m in a hazy IPA phase, so here we are. Yeah. And I guess I’m just toasting to the health of my newborn son and my wife. I used to scoff at people who did this before I had kids, but now I’m one of those people. We had a challenging pregnancy and childbirth. He was born early, only five pounds, seven ounces, and now he’s a big, chunky boy. He’s huge now. So, it’s been quite the reversal. So, I’m toasting to his health and… Yeah. Very grateful for that.

Juan:
That is awesome. Congratulations. I’m glad everything’s going well and he’s a big, chunky baby. Hopefully keeps growing and growing.

Nick:
Yeah.

Juan:
How about you, Tim? What do you-

Tim:
Well, I’ll tell you what I’m drinking and I’ll tell you what I’m cheersing to. So, I’m drinking actually the first of what is the, I don’t know if that’s going to show up well on the camera here with the glare, of the 25 days of whiskey, which is a little fun tradition that we do at Data.world. The first whiskey is Angels Envy, so going to give that a try. And I want to cheers, first of all, to family health. That is always key, and so really great to hear, Nick, that things are going well for you all there. And I will also actually cheers to Dagster. And I have not also been paid to say that, but we’re actually implementing Dagster over a Data.world and I know that we love it so far. There’s a ton of excitement in the community around Dagster. So, thanks for bringing that capability to market. Very, very cool stuff.

Juan:
Well, we’re supposed to be non-salesy here, but I think we’re all [inaudible 00:03:03].

Tim:
It’s community, though. It’s community.

Juan:
[inaudible 00:03:06]. That’s true, though.

Nick:
Non-sales is community [crosstalk 00:03:09].

Juan:
There we go. There we go. [inaudible 00:03:11] Data.world, too, is also, we have a big community around it, too. I’m drinking… I’m Mexican. I’m calling it a Mexican old fashion. I had some agave and there was some Mescal, and I had some red bitters and that’s why it’s kind of reddish in here. So, it’s actually really, really nice. And I’m going to go with you. Cheers on family. Came out this holiday weekend, spent time with my whole family and my brother who has a newborn, and it was just cool to get everybody together underneath same roof. So, cheers to family.

Tim:
Cheers.

Nick:
Cheers.

Juan:
So, we got our warmup question here, which is, describe your reaction upon glancing at Matt Turk’s latest data landscape diagram. And I think it’s going to show up right here. What’s your reaction, Nick, once you see this?

Nick:
I guess my first reaction is putting my mind in the standpoint of a company trying to buy stuff from vendors and the abject terror and confusion that must result in that. I mean, it is a kaleidoscopic and fractured and… It’s just total madness. It’s total madness.

Juan:
I’m totally with you. My my reaction is, “Oh my God, really?”Because I’ve been collecting these. I have a talk that I give where I’m showing how this increases. Right? I’ve even bee doing it for five, six years now. And it’s like, well, the first time it came up was like, well, it made sense. And then it just gets bigger and bigger and bigger and more complicated, more boxes, more logos are repeated. And I’m like, “What the fuck?”

Nick:
The other reaction that is kind of cool, there’s so many people taking their shots this week and there’s so much activity and interest in the space, behind each of these logos there’s a story of some founder or some group of who’s trying to make their stake in the world. So, on that aspect, I think it’s pretty cool.

Tim:
Yeah. Everybody trying to put their dent in the universe, in their particular area. Right? Whatever that might be.

Nick:
Totally.

Tim:
One thing I’ll add is that one thought I have is like, I wonder how many these… And I haven’t actually looked, so I don’t know. I wonder how many of these logos are duplicates. Right? Like for example, I know for example, like Data.world shows up in two columns here. Right? Like Informatica probably shows up in 45 columns. Right? It’s that kind of thing. Right? I wonder about that. And the reason why I bring that up is like, “Man, our space is not only complicated and fractured, but highly overlapping.” Right?

Juan:
Which is a good segue to the conversation we want to have here today. So, let’s dive into this conversation. And I like to kick it off with the honest, no BS question. So, when we hear about the modern data stack, the first thing that comes to mind is a list of technologies, a growing list of technologies we just see here. Right? And all cloud-based and there’s stories. There’s open source, there’s low code, no code, there’s ETL, ELT transform, reverse ETL, analytics. Right? So, honest, no BS, is the modern data set just that, a bunch of technologies? Or is there also like some methodology around it to understand what technologies are needed and how to piece all this together?

Nick:
Well, I would describe it as an emotional state. No. I actually think it started out as a stack of technologies, but it is a methodology and a mindset. And the way I frame it in my head is that we’re effectively rebuilding data infrastructure from the ground up in the cloud era and also what I’ll call the modern era, modern being defined as every enterprise in the universe has complex data needs. They’re ingesting from all sorts of SaaS services, and being able to effectively use data is a base level capability and expectation. What does that mean? One, is that the cloud data warehouse… Or maybe a lake house. We can get into that. But some sort of centralized store like that is the center of a company’s data universe, they use and bias towards using managed services, and that there’s this software engineering mindset when it comes to data. I think that’s a really interesting thing to dig into because I think there’s a lot of misconceptions around that.

Nick:
So, I think the short answer, I think the modern data stack started out as a fairly narrow definition where it’s like, “Okay, you choose a cloud data warehouse, you have DBT, an ingest tool, a BI tool,” that’s a modern data stack. But when people encounter the reality of the world, their needs expand and they grab for more tools. Like recently, the modern data stack has expanded to include reverse ETL, for example, and I think that expansion is going to occur, which means it’s not a static set of technologies. It’s a mindset and a methodology about how to build data infrastructure and data platforms.

Juan:
Okay. Let’s dig into the technology part. You started doing in this. because I want to talk about technologies, I want to talk about methodologies. When it comes to technology, you said it all started out with a cloud data warehouse, DBT, ingesting, which I would guess is the E and the L, and some analytics. Right? These four things is kind of how it all started. And this is basically just we’re reinventing ourselves, we need to go do things in the cloud. I always call it like the modern data stack is it’s in the cloud and it has the fancier UIs. That’s what makes it modern. That’s my honest no BS definition there, which is… I don’t know. Am I oversimplifying it too much? Or is that kind of… I don’t know.

Nick:
Well, I think that the other thing, the other definition, which resonated with me is I heard Ben Stansel describe This.and he’s one of the co-founders of Mode and he’s a prolific blogger, [inaudible 00:09:12] serious, like, “Friday, let’s fight.

Juan:
Friday, let’s fight.

Nick:
Yeah. Yeah. He’s the blogging pugilist, to use a fancy word. But he had the sense of like, “Yeah, modern data stack are like data products that would appear on Product Hunt,” was like another way of describing it. Meaning like it’s startupy, it’s targeted towards like hipster data people, was his definition. Well, that’s my-

Tim:
I love it.

Nick:
That’s that’s my editorial on top of it.

Tim:
That means there’s more modern data stack in Austin and Portland, for sure.

Nick:
For sure. Yeah. Yeah. Micro breweries and fancy UIs, here we come. But yeah, no, I don’t think it’s just fancy UIs. I do think that this software engineering mindset is a critical part of the modern data stack. We can get into that.

Juan:
Okay. That’s a key one. Okay. I wanted to dive into that, but I want to keep extending. So, we start with these four: cloud, data warehouse, DBT, and just analytics, how is this extending? So, you say it’s not a static, so it’s dynamic. Reverse ETL is something that we’re seeing here. Where else is this extending? Where is this going? And part of it is, is it aligned to use cases? So, for example, I got this type of use case I’m going to go do, I’m going to go in one direction versus this other use case goes into this other direction where I would not need some tool, whatever. How are you seeing it?

Nick:
The way I see it is that companies are building up their data infrastructure from scratch, they’re cloud first, and they’re answering the questions that you answer in order. Meaning that the first thing you do is that you count things, like understanding very basic metrics about your company or your enterprise. How many users do we have? What is our revenue? Like basic business metrics. And in order to answer that question, you can just answer that with ingest into a cloud data warehouse, DBT on top, BI, and you’re done. Right? In order to do the basic counting. But that’s step one, not the final stage. So, then it’s like, “Oh, interesting, we’ve ingested our data from all our different sources. We want to re-inject that into those SaaS products so that you can surface the right information to stakeholders in their native tool. Then you have reserve ETL.

Nick:
But then the people who build these data platforms, they naturally want to expand things, so maybe they want to build ML and experimentation platforms, it’s very naturally adjacent. Because, for instance, in ML, most of the work is ETL anyway. Right? Most of the work is in the data processing, so there’s natural bleeding between those two use cases. And then, things just expand beyond the scope of only SQL computation in general. People need to write custom code to do lots of things, and they need to… Among our user base, I’m always, not shocked, but it’s continually interesting all the different use cases that people come up with where they scratch together data platforms, where like, “Oh, we have these contractors, they need to insert this stuff into a Google sheet. Then we need to write some code to do that. We need to match that up with our payroll system,” and on and on and on and on.

Nick:
So, to me of modern data stack is simply following the natural evolution of what happens within a company when you’re starting to expand. And then it’s like, “Oh, we have so much data that we need to catalog it.” Right? And then you start looking to cataloging tools. “Oh, there’s enough stakeholders here and there’s enough teams that we need to start doing data lineage.” There’s a natural expansion as you invest more and have more capabilities in your data platform. And I think, effectively, what’s happening in this modern data stack landscape is that people expand their gravity for more tools to solve those problems that they absolutely need to solve.

Tim:
So, even though we’ve started this conversation a little bit more from a technology perspective, and I’m sure we’ll explore a little bit more there, you’re talking about use cases and a use case progression. You’re also talking about a maturity curve here, where as you enter in, maybe you’re starting off more with descriptive analytics and then you’re moving into some more of the prescriptive things, you’re trying to build data applications, maybe you’re now starting to roll this out to a bigger company, so now it’s not just about one group in the company anymore, it’s about how do we federate our stack and federate our governance elsewhere in the company? Why do we keep on coming back to the diagram, the technologies, though? Does that become an anchoring point where we can at least talk about like the components as they grow? I’m curious why methodology and maturity haven’t been a bigger aspect of this. Do you have any thoughts there?

Nick:
Yeah. I don’t know if I have any profound thoughts in that, but engineers who are dealing with this, they like talking about concrete things that solve concrete problems rather than just abstract process and methodology stuff. Process and methodology stuff is something that MBAs come in and do. Right? They come in and do a SWOT analysis or something like that. Most engineers’ relationship with a SWOT analysis is how like [Guilfoyle 00:14:44] and the Silicon Valley show interact with him. Know that reference. So, I think there’s some level of skepticism towards these more high level abstract think PC ways of approaching things.

Nick:
People will ask like, “Hey…”there’s kind of like a know it when you see it feeling for people, and they’re like, “Oh, this technology feels native in the modern data stack.” So, I try to listen to those users and abstract a way the general or… abstract a way. Abstract out the general principles that apply there. But I think it’s kind of a dispositional thing where the people who engage in this space are pretty… by engineering in general are quite literal usually and very value and thing oriented. Right?

Tim:
Right? I think that makes a lot of sense. You have a particular thing you’re trying to accomplish, and you’re like, “Well, what’s the tool that helps me do that?” Right? It’s like, “I need to get data from A to B,” it’s like, “Oh, sounds like it’s time for an ETL tool, my friend.” Right?

Juan:
So, I want to dig into the modern data stack has this software engineering mindset when it comes to data, but there is processes around software engineering. Right? I mean, yeah, okay, you don’t do a SWOT analysis when you’re doing software engineering, but I mean, you still have a process of how you comment and how you do peer review, how you’re checking code, CI/CD, all that stuff. And I think we are seeing that inside of the modern data stack. Right? All that work on data ops and data observability and all these things, To go see if things are breaking or not.

Juan:
But I feel that we have some methodology in there, but then sometimes we just like, “Just give me the next… I just want to go solve this problem and just get the next tool about it.” And then that’s going to expand to more of these blocks that we’re going to see in Matt’s [Turk 00:16:39] thing. Right? [inaudible 00:16:40] I’m trying to solve this very specific problem so I’m going to create this new tool about it,” but we are not zooming out and realize, “Well, I mean, it was really about…” It wasn’t a technology problem you’re solving it with technology. It was something about just if we define a methodology, a process around this, we didn’t have to go reinvent a bunch of stuff again.

Nick:
Yeah. I think your example is interesting, which is code review and source control, which I think most engineers would often describe primarily in terms of the tools they use, as opposed to the process, because the process and the tools are interlinked. Right? It’s like, “Of course we use GitHub or some other tool for code review,” but they think of it as a tool, not a process. So, I think that that’s like an aspect of this that’s going on. And so, I think for engineers, tools and process are completely interlinked and inseparable in people’s minds.

Juan:
That’s interesting to go see that. The mindset of a software engineer, that even though you can see it as two different things, process and technology, it’s just been done very nicely that they actually seem the same thing even though you are having a process and the technology both the process work together very nicely.

Nick:
Exactly.

Juan:
And I think that’s not always the case when it comes to data.

Nick:
Right.

Juan:
So, one thing you haven’t said up to now, like what we’re talking about all the, these different technologies… Mentioned the cloud data warehouse and all that stuff and reverse ETL, catalog, and lineage. Where does orchestration come into this? [crosstalk 00:18:18].

Nick:
Yeah. That’s a great question. So, I was actually on a different podcast. Apologies, I cheated on you. I was on a different podcast a couple weeks ago, alongside Scott from Brooklyn Data, and he was describing a data platform without orchestration. It’s like a bunch of kids in a sandbox and they’re not even talking to each other, they’re just doing their own thing, but what you really needed to do is coordinate and work together. And that’s really where orchestration comes in.

Nick:
So, in my mind, orchestration is… There’s a couple things you need to do. One is want to add operational robustness to your existing tools. So, without an orchestrator, what’s interesting is that we’ve regressed in the modern data… We’re talking about the modern data stack mostly. If someone’s using Fivetran, DBT Cloud, and a reverse ETL tool, they’re now stuck in a world where they have overlapping [crown 00:19:21] jobs, where you just have to hope and pray that one works after the other. If something goes wrong, you have no tool where you can debug things across those tools. You have three sandbox operational tools, and you’re scratching through logs and each of them and figuring out, “Wait, was the error in the previous tool?” You have no single pane of operational glass. That’s a problem. The data isn’t as up to date as you want.

Nick:
And then, God forbid, you want to do a computation which cannot be expressed in Fivetran or a reverse ETL tool or SQL, what do you do? Right? You have to write some-

Juan:
Code.

Nick:
Yeah. You have to write some code to do any number of things. So, this is really where orchestration comes in.I’d like to say that the orchestration really comes in when you need to start assembling your modern data stack into a platform where there’s a single operational plane of glass, you want things to be more robust, and you need to use a heterogeneous tool set. So, that’s really when it comes into play.

Juan:
So, with the modern data stack, we’re building a platform and we can start really simple where… I mean, if I’m just counting how many users and what are their metrics and stuff like that, and it doesn’t need to be up to date immediate in real time, yeah, I can just put this stuff together, that’s fine. But the moment where I’m starting to expand and the moment where there’s just more complexity about how things need to go flow and you need to have more security about like… I mean, security in the sense that I want to make sure that this stuff is actually working and all that, that’s where the orchestration is. Basically the glue of putting everything together.

Nick:
Totally.

Juan:
I mean, it’s not one of the first things that happen. May not be the second. It’s something that’s going to happen once you start getting a more complex ecosystem.

Nick:
I think so, but I think it’s like building… I Think it’s a critical building block and I think not incorporating early, you set yourself up for pain later. To me, it’s like, “Oh, we’re working in a programming language, but we’re not going to use classes for now because it’s over engineering for now.” It’s like, “Well, actually, you should probably use the building blocks you’re going to use from day one so you can build your stuff more properly.” And I think it’s also the type of thing where when you start using a high quality orchestration platform, you can’t remember life without it. You’re like, “Wait, I have to go into two different tools to debug something?” I think with the right combination, it should be what…

Nick:
In my opinion, the reason why orchestrators haven’t been used earlier in the development life cycle of platforms in general is that they’ve been really difficult to spin up, they take a lot of operational overhead, so therefore, the cost benefit ratio moved down farther in the maturity cycle. But there’s still benefit to adopting it. So, a lot of what we’ve been focusing on at Dagster is making the spin up super easy, making it super light weight, just like writing code, but building a managed service. And we’ll actually be announcing that tomorrow more broadly, early access to our cloud platform. Promise I won’t make this whole thing a sales pitch.

Nick:
But to summarize, I think it actually is super useful quite early in the life cycle of a data platform, and that the reasons why it’s been late adopters is that incumbents airflow have been so burdensome to adopt that you have to really reach a high… There has to be like a cutoff point or a high bar, so to speak, to get enough value out of it.

Tim:
Yeah. No, that makes a lot of sense, and I like your way of approaching that. That orchestration has a ton of benefit, and if it was easier, if the bar to start doing orchestration could be hurdled over in an easier way and more quickly, than it actually makes sense to bring it earlier on your modern data stack journey. I think that’s a good takeaway.

Tim:
Honestly, a thought that comes to mind a little bit is like in some of the earlier days of infrastructure as code, where folks were starting to kick the tires on things like Chef and Puppet and things like that. And they were like, “Oh, well, when you get to the point of so much complexity that you really need that, like make sure you refactor all your code to be like infrastructure as code,” but now it’s like, “Well, of course you’re going to start with Terraform from the get-go and things like that.” Right? It’s interesting how things get easier, they get involved sooner.

Nick:
Yeah. And another analogy here is actually from my previous life with GraphQL. The early GraphQL, like the co-creators, we used to often go around and say, “Oh, if you’re using… Maybe start with REST and then move the GraphQL if things get more complex.” And we got a ton of pushback from our community on that because they’re like, “Listen, you are making our jobs so much harder.” Just tell people to use GraphQL from day one. It ends up with better systems, it’s not that much… You’re underselling it. It’s not much more difficult to use or anything. In fact, I think it’s easier. We got a ton of pushback that it’s just like just start using what you’re going to use from day one because it has all these implications around the tools and processes built around it. And so, the cost of undoing it even like a month later is actually quite high. And I think orchestration properly conceived will be like that, where it’s just like it’s just what you do. Right? Because even if you just have two tools, having overlapping cron makes no sense. Things should run [inaudible 00:25:11] each other.

Tim:
Right. And then you don’t have to migrate later. That makes a ton of sense. This whole conversation has me thinking a little bit, though, about like… It’s not always as clear how to get started on this modern data stack journey to do things right and to get these tools implemented if you’re a larger company or you have a more complicated environment. Is modern data stack just really for the smaller companies or the younger companies that are earlier on their journey?

Nick:
I don’t think so, but it kind of depends on if you’re… It kind of goes back to the original premise of the discussion which is like is it a methodology or a very narrowly prescribed set of technologies? So, that’s what I think is interesting. Should companies, to use the framing of like move to the cloud, move to managed services, and apply software engineering mindset to their data processes, I would say, “Yes, you can call it the modern data stack or not,” but that’s like an undeniable win. And then you also see companies incrementally adopting technologies in the modern data stack within their organizations, as well. Yeah. I think it makes a lot of sense.

Nick:
And then the other thing that… Like for example, one of the reasons why Snowflake’s doing so well is that they’re doing such a great job of lifting and shifting workloads from on-premises data warehouses to them. They were kind of playing a different game the whole time where I think a lot of the engineers in the valley would be like, “Oh, you’re going to get people to migrate from Hadoop to snowflake?” It’s like, “Guys, people, 99% of the world is still on their on-premises data warehouse and have no ability to use Hadoop at all.” They’re jumping straight ahead. So, I think it’s going to be very similar.

Tim:
Right. People are using Oracle and Teradata, et cetera, et cetera. Right?

Nick:
Exactly. Yeah. I think that there’s going to be a similar thing because as people adopt the cloud data warehouses and they have the same problems, everyone else, they’re going to be grabbing for the next tool. So, I think that these technologies, the shift towards this style of data infrastructure is inexorable, both for Greenfield and existing companies.

Juan:
So, I really like how you said basically that this is not just for the smaller companies. Right? Older legacy companies, call it, “Yes, you can use the modern data stack if your goal is to move to the cloud, you want to manage services, and you want to be able to apply software engineering principles to the data. If that’s what you want to go do, then yeah, you can get on the modern data stack.” It’s not just for smaller companies who are growing. I think that’s a really good way of thinking about that.

Juan:
Which leads me [inaudible 00:28:11]. I think our initial question here is, is this just a list of technologies or is there a methodology? I think the answer is both. But the question here now is what are the dos and the don’ts? So, I think there’s two things. One, we already talked, if you’re a smaller company, you’re just growing, you’re probably going to do the simplest thing, just count things and you’re going to… You don’t need all this stuff. I think that part of that methodology is you choose the… there’s like a simple, minimal modern data stack, that’s the minimal thing you need to go do. Right? The four things, I think, right? The cloud, the warehouse, the DBT, the E and the L, and then analytics. Right? That’s the minimal. And then part of the methodology is depending on the use cases, you would want to go do a revers ETL or want to have more catalog and so forth. So, that’s one thing. But then when it comes to more of the, let’s call it, older legacy companies, what are the dos and the don’ts for them to start in that modernization process? What are your thoughts on that?

Nick:
Yeah. I mean, to me, the important… What I have learned through my years is that having incremental process in place so that every stage of a migration and moving from one technology to another feels like you’re just hiking up a hill rather than jumping over a canyon. So, always construct these migration processes such that there’s always intermediate checkpoints, you can stop, assess, understand what’s going on, that the people who are participating in the migration get value as early as possible so that they can see the promise land, as opposed to being promised like, “Oh, in two years life is going to be better.” I like to call this evolution means for revolutionary ends. It’s a talk I give about changing software architectures in place. Have a strategy, have a high level vision, but have an incremental process that you can stop and check and make sure that things are on track, and then deliver value to your stakeholders as early as possible in the process.

Juan:
I think that’s very sound advice and actually very good life advice, too. Right?

Tim:
Well, it fits very well with some of the things that show up on our show a lot around like dump all the ocean, take a use case first approach. Can you iterate your way to value here? Are those some of the key tenants you would point to here for a modern data stack methodology? And is there anything you would add to that?

Nick:
Yeah. I mean, we were talking about migration process. I think that one movement… So, there’s a fellow named Chris Berg who is the CEO… I think he tells himself the head chef. They love these cooking analogies at Data-

Juan:
At DataKitchen.

Nick:
… at DataKitchen.

Juan:
Yeah. [crosstalk 00:31:12].

Nick:
Yeah.

Juan:
He’s been a guest.

Nick:
Yeah. So, what’s interesting is that I think that his work and his definition of data ops… He’s been pushing the data ops manifesto and all this stuff. What he said very early on resonated with me a lot. And he really talked about the software engineer unification of data. He’s like, “Analytics is code.” That was his mantra. And I really think that a lot of… Data ops has become such a buzzy term that there’s tons of different definitions of it, and I think it’s hard to grasp onto. But really, I think that the ideas in data ops have really been co-opted in the modern data stack kind of wave. So, the entire software engineering mindset…

Nick:
Let me put some meat on some bones there. So, what do I mean by software engineering mindset? The most direct example is one of the crown jewels in the modern day stack which is DBT. And the whole point of DBT is making analysts analytics engineers. They are getting analysts who used to write SQL and save them in files on their desktop to use Git and make their analytics work product part of a software engineering process. It is, like in my view, the ultimate data ops technology in the Chris Berg framing of the term. So, yeah, I mean, that’s my take on that.

Juan:
This is a good… I’m going to pin this. Right? Minute 32 and 40 second. That was a really good analogy you did right there, that definition. I really like this about… especially with DBT, it’s such a very popular thing which… Quick parentheses. We’re having a special Catalog and Cocktails next week doing the DBT coalesce happy hour edition, so this is a good… So, the official happy hour, it’s going to be a Catalog and Cocktails podcast.

Juan:
And I think that’s one of the big things is that DBT is such… It’s such a popular hot thing right now, but because it does this very simple, small thing in a way, a very powerful and it makes a huge impact, is take those analytics as code and you’re empowering this entire workforce who does a bunch of all technical work on SQL and just adding the software engineering practices. I think that’s another big theme that we’re having right now and this discussion, is that this is all about bringing in those well-defined practices of software engineering into data. Something that we just have never done, I mean, up to now.

Juan:
I think history will go off and we’ll see this whole modern data stack like what was it? If there is one thing that we’re going to define that the modern data stack changed the world was literally bringing in these software engineering practices is into data, and I think that’s what’s going to make sure that we’re going to have a better world of data instead of all the crap that we have to go through right now just because we just didn’t have any practices around data. Anyways. That’s my little aha moment here.

Nick:
Yeah. I mean, you got me all fired up. I was just taking that in

Tim:
Absorbing it. One thing this brings up for me, and Nick, interesting in your weigh in on it, we’ve discussed this similar topic with a few other of our guests, is that when you say what you said about DBT, about how it’s turning the SQL that analysts were writing into something that’s like now it’s a GitHub project. You’re wrapping it with CI/CD, you’re applying software best practices to it. Right? To me, I feel like this is part of like a codification movement. Things are becoming more code-oriented, more software engineering-oriented. So, that’s one thing.

Tim:
And then on the other hand, you’ve got this movement to no code, low code, drag and drop-y type interfaces. How do we start to take the every man and the every woman and turn them into an engineer? Right? And to me, these things, they feel like they’re pulling in different directions. And I know there’s some overlap there in terms of how they could work together, but what is your thought process on that? Does one of these things win? Do we find a way to get oil and water to mix together?

Nick:
So, it depends on the answer to this question. And so, the question is, what is your definition of low code?

Tim:
You want me to take a stab at that and just-

Nick:
Yeah, yeah.

Tim:
Yeah. [inaudible 00:36:09] place.

Nick:
I don’t have a [crosstalk 00:36:11].

Tim:
And Juan, feel free to give a different definition [inaudible 00:36:13].

Juan:
Yeah, you go. You go first.

Tim:
And for me, it’s the user experience paradigm. Right? It is, can you make it so that you don’t have to write the code? Right? And maybe I’m already implying what maybe the marriage can be here. Right? But if I can drag and drop and say, “I want this to join with this, and I want this to be the outcome,” and I’m never having to write join and wear and select and things like that, to me, the low code experience is like, “I don’t want to program.”

Nick:
Right. [crosstalk 00:36:44].

Juan:
[crosstalk 00:36:44] I just want to say because-

Nick:
Go ahead. Go ahead.

Juan:
… a couple days… Two episodes we had Cindi Housen from ThoughtSpot, and we actually got into this conversation about low code and no code. And she’s very, I would say, low code, no code positive or pointed about it. And I’m very skeptical about it because I’m just afraid that we’re going to… It works for all the general cases, but then the world is not just one simple case. There’s all these corner cases. Then you’re not going to be able to go deal with those. Then you’re just going to hack up some stuff and then you’re going to not have good practices around it. But her definition was just you just want easier ways to go answer questions. And like Tableau, drag and dropping, I mean, that’s a low code, and obviously it works for ThoughtSpot. And she’s like [inaudible 00:37:28] ThoughtSpot is a way of answering questions. She would consider ThoughtSpot, for example, a low code or no code way of answering questions. And given that definition, I mean, I agree with that, but at some point you want to get into more of the details, and that’s when you won’t be able to satisfy with these low code, no code approaches.

Nick:
Yeah. So, that all make sense. I think there is definitely a space for low code, no code solutions. There’s a few requirements. One is they have to be composable with other tools. So, I think historically when people think low code, no code, they think of these completely siloed systems that are not composable at all and try to reinvent the entire world. Right? Take your example, though, Tim, of just a drag and drop tool that effectively the end generates a SQL statement. Right? There’s nothing preventing you from writing that tool, and then literally it saves a file and then the entire software development process takes over and it runs through a CI/CD pipeline and all that stuff. And that’s what I mean by composable, meaning that the tool that you described that would just generate a SQL statement, [inaudible 00:38:40] SQL statement like in some sort of context, but at its core, that’s what it’s doing. That can be incorporated into a software engineering process and be composed in other tools.

Nick:
So, I think there’s generally a space for doing that, like a modern data stack native low code, no code solution that can interface with the other tools reasonably well. And then the other critical component is that one of those tools, like the goal should be that it solves like 90% of the use cases, but not like the last 10% where you have to punch through it and allow, say… Imagine there’s some low code tool where a business user’s been able to scratch together a 70 stage pipeline out of prefab components, but there’s this one step in the middle that is just too bespoke and too custom and there’s no other way to do it. By making it composable, meaning like have an engineer be able to write that one and then publish it to that business user so they can plop that in. So you kind have to be part of an integrated platform instead of this complete silo that’s way over here, if that makes sense.

Juan:
So a pre a I, I fully agree, but for that basically the whole low code, no code is just a higher level abstraction and we need to be able to go make sure that we can compile that down to something that we can actually go use. So, so because that, that’s a, something that scares me too, is that you’ll have all these apps and like, “Well, you did all this work, but I want to go edit it afterwards.” Right? You need be able to go use that work outside, if it’s either I want to go edit it myself and then I can go compile it or bring it up to the higher level abstraction. So, I think that’s something very, that’s crucial, and that goes back into your composability.

Juan:
I mean, query languages, I mean, SQL is a beautiful language because it’s all composable. Right? Tables in and tables come out. I can go use this very easily. I think that’s something that when it comes to declarative languages is something that changed the way how we think about computation. Fully agree with this. The composability is key for low code and no code to be successful. I went on another rant. Sorry.

Nick:
That was a very well-reasoned discussion.

Juan:
Well, there’s another thing we wanted to touch on before we get into our lighting round question, is when it comes to all these tools and we started talking about the Turks diagram and everything, how much consolidation do you think is going to happen in this space?

Nick:
Oh boy, that’s the bajillion dollar question, I guess, here. I think it has to. It’s just a question of what type of consolidation and if it’s a way that’s beneficial for users. Again, I’m going to invoke from the gospel of Ben Stansel again, which is he described it what’s needed as a data OS sort of, and a good analogy is the way that your apps get structured on your phone. Meaning that there’s still a huge amount of heterogeneity and room for innovation, but it’s within like a confines or a structure that makes it comprehensible, rather than just stitching together things from all over the place. So, I think there will be some consolidation, but not in the way that Databricks and Snowflake want it. I mean, maybe it will end up like that, but I don’t think that’s a good outcome for users where, effectively, one of those two becomes a new Oracle. Right? There’s this titanic battle that’s happening between Snowflake and Databricks where they both want to become the Oracle of the cloud, meaning like a one stop shop-

Tim:
The new monolith.

Nick:
The new monolith. So, I think there needs to be simplification and consolidation, but without siloed monolithic architectures where early in your career you have to say like, “I’m a Snowflake person and I can’t work anything else.” That sucks. It’s kind of going back in time when there was Oracle people and Microsoft people, and I don’t think that’s great. So, I’m a big fan of open standards that can span those vendors.

Nick:
Now, call it consolidation, call it unification, call it simplification, but something has to happen in order to start to make sense of this world, so to speak. And we’ll see how that plays out. I have my own theories about that. I think that there will be a unified data management platform where you can plug in different solutions for different sub-components of it, whether it’s data quality or lineage or whatever. But yeah, I will end my wand user word rant.

Juan:
Well, let me go follow up on this. So, connecting a lot of dots of previous conversations, we had Andy Palmer Tamer, I think, last episode, and he’s totally… He says the monolith is dead. And I can imagine that you can see a Snowflake trying to consolidate and trying to make that one stop shop. Right? The modern Oracle and the cloud type of thing. And some people it’s like, “I just have one thing and I don’t [inaudible 00:44:17] deal with everything.” Yeah. There’s reasons on that. But then if we’re not going to go down that route, then we need to have… things need to go work together, talk to each other. And I think standardization is something that is going to be key there.

Juan:
So, I ask myself, what does standardization around the modern data stack look like? And this is something I’ve seen talks, and I’ve actually had some conversations with Bob Muglia, the former CEO of Snowflake, and he is so… I mean, he is really betting on we need the metrics layer, which is another thing I’d love to go talk about, and standardizations and knowledge graphs. Those are the three things that he’s focusing on. But it’s kind of easier said than done, and we always think about that XKCD comic. Right? You got 14 standards, we need one standard to go rule them all. We got 15 standards. Right?

Juan:
So, how much of that is actually going to happen? I mean, is it just going to be like little cohorts of folks, like we work great together? Is there really going to be a standard that’s going to connect all types of modern data stack tools through metadata? Or we’re just, I don’t know, not thinking about this, but it’s really not going to happen and we’re just going to see Informatica 2.0 coming around which is just going to go buy all these companies and investors are putting money all over the places. Right? And they’re just, “You know what? Let’s just consolidate and that’s how we make our money.” I mean, not a question here, is just a another big ran, so let me throw [crosstalk 00:45:46].

Nick:
Yeah. No. I mean, I think it’s going to be a very interesting five to 10 years. Let’s put it that way. Because I think there’s a bunch of different potential world states about how things cohere and consolidate, and it’s effectively impossible to predict right now because it’s too complicated. To tout my own wares, I think that orchestration is a natural place, a pivot, to leverage around that because by its nature, it is the thing in the stack that touches everything else. You cannot exist really within a data platform without an orchestration layer because the orchestration layer determines like where some… All data comes from somewhere or goes somewhere. Right?fundamentally. And the orchestrator is like at the heart of that. So, I have my own theories around that, obviously.

Nick:
But there’s a lot of other people can say like, “Hey, it’s going to be the data catalog that you unifies everything,” or other people will say like, “Actually, the cloud data warehouse, it actually makes sense for them to build a silo, because that’s where all the data’s going.’ And you can know stuff there. Metrics layer is a candidate. We’ll see. I don’t have any-

Juan:
This is super interesting. You’re saying like the orchestration technically touches everything. The catalog also is touching everything. It may seem kind of a natural thing, is if you’re touch everything, you’re already a candidate to be very opinionated about how these things should be connecting together. Because if I’m tool A or tool B, well, I have my opinions about tool A, and I talk more to tool B, but you probably don’t know what happens in tool D and E and F and so forth. Right? But the orchestration, the catalog does talk to all these things. So, I know we’re non-salesy here, but let me take a little bit of a liberty around this. Is it up to folks like you and me, I mean, Elemental Data.world or something that say, “Because we have the experience of touching all these different tools, we’re the ones who have that strong point of view what a standard could look like about this”?

Nick:
Yeah. So, I think that in a lot of domains and data right now, working on standards is kind of putting the cart before the horse because, to me, the precondition for a standard is proving value. And a lot of these domains, I don’t know if we know the best way to do cataloging or lineage or orchestration. We have a bunch of different people with their own thesis around things, but to me, the standards should… There’s like a proper process with building open standards, and the first thing is you have to prove the core underlying value of it, that actually it works. And I think if you try to standardize stuff too early, you hamstring yourselves and don’t have enough flexibility.

Nick:
To the GraphQL analogy, we came out with GraphQL as an open standard, but we had worked on it for three years inside of Facebook and built the whole company’s infrastructure around it, so we at least had like one data point of like, “Yeah, it works here.” And then we moved on to standardization. So, I think for a lot of these tool and data, we’re still figuring it out, and I think there’s a process that goes along with it.

Tim:
That makes sense. And I think this has given us a lot to think about because we’re thinking about how things fit together, we’re thinking about how they evolve. This is the beginning of lots of thinking and lots of conversations to come about how things should move forward. Hey, Nick, it’s been exciting to chat about how this can evolve together because I think there’s a lot of moving pieces and also a lot of exciting opportunity here.

Nick:
Totally. Yeah. It’s never been more exciting to work in this domain. There’s just so much activity, there’s so many smart people. I think also people are very open-minded to change right now, which is very exciting.

Tim:
Yeah. There’s an opportunity at this moment where people are looking for new approaches, new tool sets. I mean, you look at something like DBT, could DBT have been as successful six or seven years ago? I don’t know. Right? It feels like the conditions might actually be in a better place for something like that to get adopted today after the whole Hadoop movement happened, after some of these things have happened. Right?

Nick:
Yeah. I totally agree. I think there’s also like an influx of… Another dynamic here that’s opening up the space is that companies are so desperate for data engineers that they’re moving folks laterally over. I saw this happen in frontend 10 years ago when people, more like mainline, normal engineers moved into Javascript and frontend to make sense of that world, and then all sorts of interesting innovation came out of that. I think is similar things happening in data right now. We talked to lots of people who are looking for a solution for orchestration and they find Dagster and like, “Wow, this makes sense to me,” because they came from more traditional software engineering domain and this is intuitive for them. And I think that’s happening all over the place. Yeah. It’s super exciting.

Juan:
Oh man, there’s so much we can keep talking about, but we need to go wrap this up now. And I’m going to go into our honest no BS lightning round. I will kick it off. Yes or no. All right. Is the modern metadata stack separate from the modern data stack?

Nick:
No.

Juan:
Oh, okay.

Nick:
Am I saying why?

Juan:
Go, go, go. Quick.

Tim:
If you want to add a little context, yeah, please.

Nick:
Oh yeah. I think metadata is so core to companies’ internal data platforms that it’s inseparable. It’s like chips without salsa. I don’t know what the right analogy is. But anyway, it’s just you need both.

Tim:
They’re intrinsic to each other.

Nick:
Yeah.

Tim:
Question two. Is the modern data stack a pipe dream for large legacy data stack companies?

Nick:
No. I don’t think so because I think there’s space… We see this a lot in terms of, I’ll call, it Greenfield within Brownfield, meaning that new projects with new teams who are empowered to make their own infrastructure choices when in the context of a larger enterprise, there’s much more flexibility around that. No.

Juan:
All right. Next. Is data load, the E and the L, the data warehouse, DBT as the T, and BI tools the core of the metadata stack? Or the modern data stack.

Nick:
Yeah. That’s like the initial core you start with. For now.

Juan:
All right.

Tim:
For now.

Juan:
[crosstalk 00:52:51].

Tim:
That’s a fun way to end that statement. And then last lightning around question here. We talked a lot about standards and interoperability and things like that. Will DBT become that lingua franca?

Nick:
I don’t think so because I think there’s a large number of analytics use cases that can’t be described as SQL.

Juan:
Okay.

Tim:
That’s a smart stipulation.

Juan:
Yep. Well, it’s our TTT, Tim takes it away with take-aways first.

Tim:
Yeah, sure. This is the part where we get to tell you, Nick, if we got all the right takeaways here. So, first of all, the modern data stack, we really talked about its core being the E and the L, the data warehouse, the T, and the BI, but then there’s all these other aspects that are really interesting around it, such as orchestrations, such as catalog, such as so on and so forth. Right? Machine learning, AI, et cetera. But you talked about a process here. Right? The first thing you do is you’re counting things. You’re implementing your basic metrics. Then maybe you’re trying to figure out how you can take some of these metrics and push it back into some other systems. Right? Maybe that’s where things like [inaudible 00:54:14] ETL come to play. You’re trying to get smarter about your data. Now you’re evolving from descriptive analytics to prescriptive or predictive type analytics. And your company is growing in size, it’s getting more complicated, your data use cases and quantity and complexity are getting more complicated. Okay, you need catalog, you need lineage, you need monitoring, you need all these different things. Right?

Tim:
So, there’s a process here. There’s a methodology behind all of these different technologies. And ultimately, you talked about how the methodology has be incremental. You said you want to hike up a hill instead of jumping over a canyon. And you said evolutionary means for revolutionary ends, which I think is a great phrase there and one I want to use. I actually have a little backlog of T-shirts that I’m trying to create for Catalog and Cocktail. I might have to coordinate with you on whether or not we can stick that quote on a T-shirt

Nick:
I’m down.

Juan:
All right. I got my takeaways here. So, first of all, the modern data stack, it’s this natural evolution that a company goes through. Right? And its core thing is about bringing software engineering practices into data. We are in process of reinventing the data ecosystems going to the cloud and your warehouse or lake house at the center, all these key things around it. Another definition, modern data stack, it’s hipster data people, for the hipster data people. I love that. And it’s just, it started as a stack, but it is evolving into some methodology. When we talk about, is it just for smaller companies? The answer is no. I mean, if you’re moving to the cloud and you want to go work with managed services, you’re thinking about adding software engineering principles of data, doesn’t matter if you’re small or large, whatever, you’re part of the modern data stack. So, those legacy companies can do it too.

Juan:
And when it comes to consolidation, I like how you said it, call it consolidation, unification, standardization, whatever it does, it needs to be simple. Is there going to be a unified data management platform? Is that going to be a monolithic or a bunch of players working together? I mean, it’s still really early. We’re figuring this out. We can’t put the cart in front of the horse around this. But hey, there’s tools or aspects of the modern data stack that connect to everything like orchestration and cataloging, and they’ll have a lot to say about that. How did we do? Good summary?

Nick:
I thought it was great. You guys have really been paying attention to all my… what I’ve been spewing, which I appreciate.

Juan:
Well, this has been a fascinating discussion, and I’m really excited to actually meet you in person, hopefully, one day and have this conversation live. So, I want to throw it back to you, finally, two questions. Advice? What’s your advice about data, about life, or whatever? And second, who should we invite next?

Nick:
So, guest suggestions, I was thinking about this before for the show. I hear lots of vendors and tool authors on these shows all the time, and I think we should talk to more power users about the problems they’re trying to solve. And an early user at Dagster who also was a dev advocate in his previous life, this guy named David Wallace, who works at Dutchy, actually, I think would be someone who comes to mind [inaudible 00:57:33] interesting takes on it, but from the standpoint of someone who’s trying to solve problems, not pitch a tool. And I think that’s a perspective that’s often overlooked in the podcast sphere. So, both the individual human and general guest classifications suggestion.

Tim:
I love that.

Nick:
Yeah. In terms of advice, what is an important piece of advice? Can you narrow… Anything? Do you want to narrow a domain [inaudible 00:58:06]?

Juan:
Start with have orchestration, I guess that’s a good piece of advice. Right? Anything.

Nick:
Yeah. I guess this is going to probably be cheesy, but optimize for the people around you in your life, both in your personal life and your work life. I’m very blessed to… We literally had an all hands five minutes before. It ended five minutes before I hopped on this show. I’m really proud of the team we’ve built and the people on it, and it’s just joy to work with them all. And it makes all this so much more fun. Yeah. Optimize for the people that you spend time with. It’s not-

Tim:
Love that.

Nick:
… less… Not as much about money and status and all that stuff.

Juan:
Love it. Nick-

Tim:
That’s great advice.

Juan:
… thank you so much. Appreciate it. And just quickly, next week, we got a lot of stuff. It’s DBT coalesce, we’re having the official happy hour. It’s going to be a special episode of Catalog and Cocktails with Claire Look and Meetesh Karia from The Zebra, and that’s going to be on Tuesday live at 2:00 PM Pacific, 4:00 PM Central. I’m also going to be at the Data Governance and Information Quality Conference live, and we’re going to do some live talking to folks in person conference live.

Nick:
That sounds like a rager.

Juan:
It’s going to be interesting. And then next week is going to be Kelly Wright, the president and CEO of Gong, and was a former VP of sales at Tableau. And we’re going to have a lot of conversation about data and analytics. Nick, cheers.

Nick:
Cheers.

Juan:
Thank you so much.

Nick:
Great time.

Tim:
Thanks so much for doing this.

Nick:
Thanks for having-

Enter Content Here.

M

See the catalog for data discovery, governance, access, and analysis.

Request a demo