New data tools drop daily, but are they worth the hype? Some launch with overinflated expectations, while others solve a problem that doesn’t really exist. On the flip side, there are tools that transform the enterprise and have the potential to change the way we manage data forever. The question is, how do you tell the difference?
Tim and Juan were joined on the Catalog & Cocktails podcast by special guest Erik Bernhardsson, formerly with Better.com and Spotify, for a conversation on data tools; the good, bad and the ugly. Below are a few questions excerpted and lightly edited from the show.
Honest, no BS, which tools suck and don’t suck right now?
I’m kind of an anarchist. I think in a way, you should just let people use whatever tool they want. The fact that they use that tool probably means it’s good. People using things probably means they derive some value from it. What’s bad? I think to what extent data teams are wasting so much time on infrastructure stuff, stuff that’s not core business logic. So if there’s any tool I want to call out, maybe as “bad”, it would be maybe Kubernetes. I feel like AWS is kind of annoying. All these like Terraform, Docker, all that stuff. I just want to do data. Why do I have to write YAML files? I don’t know. YAML, I’m going to call out YAML. I hate YAML.
Where do you see the business logic being implemented?
I mean, in a way, I’m the biggest fan in the world of SQL. I wrote this long ramp blog posts, a couple years ago, that’s about how much I hate random DSLs and query languages and how I just want my SQL back. And then a couple years later, now there’s SQL everywhere and I’m like, :oh, let’s take it a little bit slower, what’s going on?” So I don’t know. Maybe it’s a good thing. Maybe it’s a bad thing. I think having a lot of SQL is certainly better than having a lot of bad code languages for sure.
I mean, I know, Materialize, I think is building something incredibly cool. It’s kind of a risky thing. Materialize for the people that don’t know, basically the whole idea is you can build and incrementally materialized view. So the idea is in SQL, you can define a view and that gets incrementally refreshed with new data, which I think is pretty cool. You can do a lot of real time data transformations using that. I think it’s still like TBD, that’s a lot of business logic again that we’re putting in SQL. Is that a good idea? I don’t know. I think. I don’t know. But Materialize is certainly like a tool… And full disclosure, I’m a nominal investor in that company and I know the CEO, but other than that, I think there’s a lot of exciting stuff.
What I think is important to remember too, is also that I think the demand for data is just enormous. And so, there’s been this interesting shift in the last five years where basically making data available on a SQL level has been by far the easiest way to make it broadly accessible to people. I’m also quite bullish on code and I think in a way, the fact that we’ve seen the pendulum swing so far towards SQL to me means that’s a reflection of how large and how desperate people are to work with data. People really want data and SQL just ended up being for now, like the fastest way to get people access to that data.
What else has been around for so long and will continue to be here?
I don’t know if you’re familiar with the Lindy effect. Basically the idea is the longer something’s been around, the longer it’s probably going to stick around. So when I’m looking at programming languages and whatever we are going to have in 50 years, I think we’re far more likely to have C and SQL in 2061 than we are to have… not to throw them under the bus, but just saying R, Python or Julia or whatever. And maybe it’s a little bit unfair to bucket the same thing, C is obviously a very different thing to systems language, but it seems weird. But when you think about it, I think we’re going to have a lot of people writing C in 50 years. I could be wrong. Who knows?
I think there’s actually another effect too. I think I’ve also increasingly been convinced that there’s a certain amount of conservatism. I think some languages are so eager to solve every new problem in some new way. And then they just get very bloated. I look at C++ and it’s like, I can’t use this is, it’s insane. I’m a little nervous about Python. The surface area of the language is enormous these days. I’ve done a lot of Async IO in the last six months. And it’s a mess. It’s so complicated. I love Python. It’s by far the language that I’m the most productive in, but there’s just so much surface area today. And so I think that’s another interesting point is that SQL in a way, part of why it’s successful maybe it’s also because it hasn’t really evolved much.
So what do you think about all this low-code and no-code?
I have a similar view on low-code and no-code that I expressed earlier, which is that to a large extent, I think it reflects the extreme demand for software that exists in the world today, and that people are so eager to build software that they want whatever tool needed. And one very crude, reductive way to think about, what is a software engineer? I think is actually their job to take business goals and express them as business logic. And programming languages are just like one concise way to express business logic. So I think you’re always going to need people who are trained at taking a fuzzy objective and then think through all the edge cases and how do you make it into logic and that’s just unavoidable as a problem.
I think you’re always going to need people to think through all the edge cases and all the things. To me that’s what a software engineer does. Right? And so, that’s why with no-code or low-code, you don’t really escape those problems, you just push them somewhere else. And so what I think we’re going to see a lot of companies adopting those tools, but in the long run, they’re just going to reinvent software engineering. And then they’re going to realize they should just hire software engineers to take care of this. Because now they have a billion, trillion edge cases in this and it’s really hard.
And software engineer comes in and they’re like, “yo, actually we figured it out, it’s called unit testing. And by the way, we have this cool thing called Git and we have this cool thing where we do continuous integration and pull requests,” and everyone’s going to be like, “wow.”And hopefully by that time, we’ll have even better programming languages for expressing logic. So, I think to me, the success of low-code and no-code in a way is a good sign if you are a software engineer because it means the demand for your services is going to be a bit higher in the future.
If you had your canvas of data tools, what are you pulling together for your modern data stack?
I think it’s funny because every startup has a blog post today and they’re like, “here’s the new modern data stack. And by the way, we’re like this big box in the middle.”
And then they’re like, “we’re right next to dbt,” because everyone loves dbt. So they’re like, “here’s our startup and then right next to it is like, oh it’s dbt and then there’s like Fivetran and then there’s a bunch of all these ancillary things around it.” So I don’t know. I’m increasingly skeptical that the modern data stack exists. I think it’s just like whatever people wanted to be, it’s like a Rorschach blot test, you just see whatever you want to see in it.
I think I see a lot of boxes and the question to me is, I don’t think anyone is happy with that fragmentation because if you’re a company that works with data, do you really want to bring in 35 different tools and duct tape them together? Especially if you’re an enterprise company, do you want to go through 35 procurement processes? Certainly not. And so, I know I’m dodging to answer your question, but I think to me, at least we’re probably going to see a lot fewer boxes on that. And I think over time, there’s going to be a lot of consolidation in this space and a lot of tools taking over adjacencies and doing multiple things over time.
- Observe: get to know your team and their existing workflows.
- Tools: what is in your tech stack, and what do people want to use?
- ROI: focus on key business outcomes and objectives.
Visit Catalog & Cocktails to listen to the full episode with Erik. And check out other episodes you might have missed.