NEW Tool:

Use generative AI to learn more about data.world

Product Launch:

data.world has officially leveled up its integration with Snowflake’s new data quality capabilities

PRODUCT LAUNCH:

data.world enables trusted conversations with your company’s data and knowledge with the AI Context Engine™

PRODUCT LAUNCH:

Accelerate adoption of AI with the AI Context Engine™️, now generally available

Upcoming Digital Event

Are you ready to revolutionize your data strategy and unlock the full potential of AI in your organization?

View all webinars

Why Data & AI Needs to Embrace Interoperability with Dael Williamson

Clock Icon 55 minutes
Sparkle

About this episode

Dael Williamson, EMEA CTO at Databricks, joins us to reveal how interoperability is the key to unlocking expertise, exchanges, and ecosystems. Tune in for an honest, no-BS conversation on navigating the complexities of data ecosystems—and how standardization, context, and interoperability can be leveraged to tackle organizational challenges.

Tim Gasper [00:00:32]:
Hello, everyone. Welcome. It's time once again for Catalog & Cocktails. It's your Honest, No-BS non salesy conversation about enterprise data management with tasty beverages in our hands. I'm Tim Gasper, longtime data nerd customer guy, product guy at Data Dot World, joined by Juan Sequeda.

Juan Sequeda [00:00:49]:
Hey, Tim, how are you doing? It's Wednesday, middle of the week, end of the day, and it is time to take a break and have our Honest, No-BS data conversations. And today we're having the conversation with Dael Williamson, who is EMEA, CTO of Databricks. And Tim and I, we both met Dael at a Gartner back in London a while ago. And it's like one of those moments that you're like, oh, we just kind of randomly got connected. We sat having lunch and like, holy crap.

Tim Gasper [00:01:15]:
It was one of the most riveting conversations I've ever had.

Juan Sequeda [00:01:18]:
You need to be on the podcast immediately because, like, that's it. So. Hey, Dael. Great to have you here. How are you doing?

Dael Williamson [00:01:24]:
It's really cool to be here. And, wow, your intro's got a lot of energy. It's very cool. I love that, though. That was the funniest lunch because it was just like kind of meat lunch. And I think we went in about 73 directions with that conversation. It was hilarious.

Juan Sequeda [00:01:41]:
So let's see where we end up here today.

Juan Sequeda [00:01:45]:
What are we drinking? What are we toasting for today?

Dael Williamson [00:01:49]:
Well, we were having a conversation just before, and I think it's a toast to bingo and Bluey, which, you know, I've got girls. You know, they remind me so much of that tv series. You know, Tim, you obviously brought it up and actually you hit the nail on the head. And, Juan, you now actually have to go watch this.

Juan Sequeda [00:02:09]:
I have to, right? There's too much Minnie Mouse, so I'm.

Dael Williamson [00:02:12]:
Drinking a ginger beer mocktail. All right.

Juan Sequeda [00:02:15]:
And cheers to Bluey.

Tim Gasper [00:02:17]:
Cheers to Bluey. Yeah. Hey, I'm drinking a little bit of Woodford reserve double oaked. And I will cheers to Bluey as well. And the reason why I even brought it up is because you had said, like, your kid said something like, I don't want to do that anymore. You know, like, that sounds like it came just out of bluey. And so the darn things that our kids say, they're, you know, mostly cute.

Dael Williamson [00:02:40]:
I know.

Tim Gasper [00:02:41]:
Sometimes annoying. Mostly cute.

Dael Williamson [00:02:43]:
Yeah, it's very cute until you want to leave the house.

Juan Sequeda [00:02:47]:
And I'm having, so I'm switching today. I love discovering all these different sparkling waters. And this is, I think is my favorite one. It's a peach guava sparkling water, uh, from our heb, our local, uh, the supermarkets here in Texas. This is, I've had many of these already, so here's to blue. And he introduced this into my household. So we'll see. All right.

Dael Williamson [00:03:09]:
We'll see if it lasts.

Juan Sequeda [00:03:10]:
Our warm up question today, because our topic is about interoperability. What is something that should be interoperable, interoperable and just work, but it doesn't.

Dael Williamson [00:03:21]:
So I actually heard this today. So my wife was talking about how you have to prepare a PowerPoint for an executive. So you have a team that has, like, slides, and they might be prepared on Google Slides, but they have to convert it into a different format and push it up. How many times have you been at an event where you're presenting and the bloody presentation isn't in the right format and then you got to sit there tweaking the damn thing? So I think that's, for me, that's the absolute lost mile irritation of interoperable presentations.

Juan Sequeda [00:03:58]:
Oh, my. I mean, this is why I didn't let, my solution is, like, I just use Google Slides. All I need is your browser. I have a link on this that's going to work because the web is the best interoperable kind of mechanism we have. So. But, no, I hear you on that one.

Tim Gasper [00:04:12]:
I hear you there. Like, the worst thing ever is when you, like, you down, they say, oh, it's going to be in PowerPoint. And you're like, oh, did it convert? Okay. And they're like, well, take a look and see how you think, right? And it's like ten minutes before your presentation, you're like, oh, my God. All the, all the fonts are messed up. All the sciences are wrong.

Dael Williamson [00:04:27]:
Yeah, exactly. Like, why is that so horrible? Proprietary. That's why interoperable is key.

Juan Sequeda [00:04:34]:
So, Tim, how about you?

Tim Gasper [00:04:36]:
My interoperability issue I'll bring up is getting things to show up on screens, right? I feel like, I feel like airplay got this the best. But even airplay, like, I have my tv at home, right? And I have Apple TV running on the PlayStation down there, and then I have Apple TV up here in my office, and, like, I just want to present what's on my phone to the tv. Like, how do I make this work? Right? I just wish those things would work better.

Juan Sequeda [00:05:04]:
And I think for me it's like things like Bluetooth, like this shit usually works well, but sometimes, like when you like this headphones here, like that's one thing. Then it's like the plugs, like, I have to have. All my adapters are in proprietary stuff. All my little dongles are all this stuff before, I mean, it's gotten better, but before I had a whole bag of stuff, and then it's the freaking outlets. I had to go travel. Like go change. I'm like, sure, I have my own. My bag of all the different outlets, all that stuff. Like, why do you, do you ever.

Dael Williamson [00:05:32]:
Do you ever look at the scan as you go through airport security of your bag and all the wires and like. And like these people are going, what on earth is going on here?

Juan Sequeda [00:05:42]:
Yeah, yeah.

Tim Gasper [00:05:44]:
It makes how they can even do their job. Like the things that come through there, like, that looks like a bomb. Like, you know, like there's wires all over that thing.

Dael Williamson [00:05:52]:
Now one of our, one of our sales leaders has like light up shoes and he literally got stopped and they wanted him to dismantle the shoes and stuff. It's the funniest story ever. Like, you can't make this up. But, yeah, I'm with you on all three.

Juan Sequeda [00:06:09]:
Okay, so this is good. So let's kick it off. Honest, No-BS. What do we mean? What do we all hear collectively? But what are we giving by interoperability and why and how do we embrace this?

Dael Williamson [00:06:22]:
Well, the thing that annoys me and interoperability. Like, we talk a lot about integration. We talk a lot about ingestion. Do you know, actually it's quite funny. Like, I've listened to your podcast before, and there's a lot of. We talk a lot in the modern data stack about ETL. The only reason we have to do that is because of proprietary formats and the fact that the t has to be converted into another format. Right? So ETL only exists because of a lack of interoperability. And if you actually think about it, like exchange of goods, seamless, smooth exchange of goods, of data, of data products or whatever you're going to say is really hard because things just don't work in that format as they should work in this format. So to me, that's one of the sort of big pain points. So if you're going to set a problem statement, it's about exchange without that kind of time, tax or integrity. Tax or those sorts of things. And it's a funny thing because it just, it should be smooth, there should be standards, but there aren't. I mean there are in some layers of the stack, but like Linux for example, you know, has won the standard. And that's, that's why I'm a big advocate of open source because, um, you know, there's a famous quote by a guy named Bill Gurley. He says like the most complex problems in the world are solved through open source. And, and that's, you know, it's because it's a community, it's a group, it's a group thing. So to me the problem statement is how do you, how do you make seamless flow of anything data, use case code, anything from one bounded context to another? And I think that if we, if we were to solve for that, we would all be working on much more interesting higher order problems like cancer or anything,

Tim Gasper [00:08:31]:
Instead of solving integration problems.

Dael Williamson [00:08:33]:
Yeah, exactly. We just, we're just plumbers, right?

Tim Gasper [00:08:35]:
Yeah. I think this is so important and I think you gave like a perfect setup here around like imagine what we could be solving if we were, if we weren't spending all this time and energy moving things from one format to another, from one place to another.

Dael Williamson [00:08:54]:
I don't know, it's ludicrous. So I'm going to give you a data point because it's kind of funny. Like one of the lot of people, I'm a biochemist by sort of background. So in your body right now, in this minute, you've just produced about 2.4 million red blood cells. They have a lifespan of about 120 days. Now you have about, you have a genome inside that red blood cell that's about 3.4 billion nucleotides. And those are copied with absolute precision, no errors, because if they're errors, you get some really weird problems that result, but absolute precision every minute. So that's 3.4 billion times 2.4 million every minute. And our bodies can do this insane metadata copying and insane replication, yet when we go to work we copy and paste data around and it's slow and it's painful and we have to check integrity and we have to check quality and it's just time consuming and it's not interoperable. So this is what I think about when I go to work. I'm like, oh my gosh, how crazy.

Juan Sequeda [00:10:24]:
The joke we live in, the joke, I always say, and I forget where I heard it the first time, which is like we could take a rocket to space, we can bring it back to earth. It can land on a platform in the middle of the ocean. But I still can't say if these two spreadsheets match it.

Dael Williamson [00:10:43]:
So true.

Dael Williamson [00:10:46]:
This is even more funny. How do we tell if the two spreadsheets match? We send about 400 emails and set up 53 meetings. And then we write a PowerPoint and copy the two things and explain in kind of little context bubbles to, like, people up the chain why these things match and what is different about them and why. Are you kidding me?

Juan Sequeda [00:11:12]:
So let's dive into this.

Juan Sequeda [00:11:16]:
I mean, my follow up to the joke is like, that implies that data management is harder than rocket science.

Tim Gasper [00:11:25]:
Data science is harder than rocket science.

Juan Sequeda [00:11:26]:
You can, like, you can say yes. And I think the aspect here is it's the human aspect, right? Because you're talking about in your story from, in the biology. Like this is, you have the biology, you have, you have the natural sciences. Like, that is something that lives in this controlled world. But humans are much more complicated than that. So I think kind of your opening statement about like, why do we do ETL? It's because of lack of mobility. We have to go pay this extra tax. So there's, there's incentives behind this, right? I mean, there are people who are incentivized to make the tax higher. Higher right there. In a way, we could argue that there are more incentives to make things more complicated than make it easier. Easier. Because otherwise, why haven't we done this? So that's my question to you. What? Look, you're bitching about this problem. I'm bitching about this problem. Like, we're all freaking annoying. I got books around this stuff over here. But why do we continue to complain about this decades and decades and decades afterwards?

Dael Williamson [00:12:23]:
I think it's because we can't get on the same page about like, what is the standard? It's a bit like language. You know, language in general. Like, you know, why are there so many programming languages and why are there so many other languages? And it's just because, you know, my language is better than your language. No one can agree which is the standard. I sort of came across this problem. I remember I was in the fitness industry and I was trying to get data out of fitness equipment. And you would think that that would be a relatively low lift, right? This thing produces a bunch of data points, sensors and stuff. And as you went to each equipment manufacturer, they had their own standards, they had their own formats. They even had different words to describe the same thing. Right. And I think as a common unified community I don't think we've done a good job of coming together to agree on a standard, especially in the space. Right? If you look at semantics, if you look at context, it's arguably one of the most proprietary part of the stack. Like every vendor has their own version and it's very very proprietary, whereas lower down, so sort of bottom up, right? So Linux has become a prevailing standard. It's actually won the OS standard and it's an open source community. And that's because enough people rallied around it to form a community. If we move one level up like you've got Kubernetes, kubernetes has done an insane job. Like it had competition for a while, Mesos was competition for it. But Kubernetes won and everyone gravitated behind the standard. In data storage formats, park a has become a de facto standard. More and more people are backing parquet. Even the sort of access protocols like Delta and iceberg and Hudi all write metadata on top of parquet. So one could argue the standards parquet. Now we're in a kind of little jump around these. So more and more as you move up the stack from bottom up, it's standardizing. And that's allowed us to do some very new gnarly cool like projects and things like that. But we're now getting to that inflection point where how do we create semantic standards? And I know you guys have been doing a lot of work on this, but when you, when you really look into the community, it's that cult and that camp and that group and no one can agree on the standard. And then there's, there's a lot of proprietary stuff sort of lying around. And most of the semantic logic is locked in proprietary formats. You've got Daxes and you have Abab. And I could pick on as many proprietary vendors as there are out there. But that's where the lock in is happening and that's where the interoperability is missing. So slowly as we're moving up the stack, it's starting to become more interoperable. Like a databricks. We can move a parquet file from company to company through Delta sharing seamlessly in a real kind of value stream. So if you could go all the way from sort of the initial product that ingredients that go into the product creation all the way through to activation, all the way through to whoever's buying it if you wanted to, why can't you do that? Because the data about the data that sits on top, almost impossible to sort of understand. So we have this. So I think that's we're getting there as a community. We just haven't, we haven't hit the. We're now fighting in the right zone. And that's why I'm a big fan of what you guys do, but also a big fan of like sort of what can happen in the semantic layer.

Tim Gasper [00:16:48]:
Yeah. Thank you, Dael. I'm excited about what we could all do together around this. I know Juan's very passionate about this as well. And just before we dive more into the sort of semantics topic because I think there's a lot to kind of unpack there around how that and interoperability kind of tie together. Just to come back to interoperability and standards more broadly. I'm wondering from your perspective, I think there's this age old question of like, well, should we be more interoperable? Yes, like we should. Right. Should we have standards where we can make that interoperability happen? Yes, we should do that, but it doesn't always happen. And you gave Linux as an example, right? So obviously like a community backed it and it became unstoppable. It became sort of an unstoppable force. Right. Sometimes standards happen because there's some sort of like, you know, let's call it like a monopolist, like somebody who kind of commands the market and is able to be like, like NATO standard, right.

Dael Williamson [00:17:54]:
There's a fairly strong standard around NATO. There's a fairly strong. So there's one for you. Right?

Tim Gasper [00:18:01]:
Yeah. So there's this one, you know, pretty powerful body and it was able to say here's the standard, right. And then, you know, maybe closely related to that but a little different is like there's some kind of a, either a regulation or maybe a standards body and as long as it has enough teeth, right. It has enough sort of power, it can actually kind of say, hey, let's create that collusion. But prisoner's dilemma, right? The collusion and coordination can be quite hard. And to kind of cap off my long winded kind of setup here, I'm curious, from your perspective, does one of these approaches work better than others? Is it all three that are fine and we got to let it be embraced, I guess. How do we make more interoperability?

Dael Williamson [00:18:43]:
Well, I mean, I'll speak through the lens of open source because it's one I'm sort of super passionate about and one that sort of, I've seen really prevail. So let's say for argument's sake we pick on the two major foundations. So you have the Apache foundation, you have the Linux Foundation, Linux foundation has actually achieved some great community scale and has a pretty strong governance community. Apache has got an incredible gathering. So both have been incredibly successful. Like if you look at Spark, Apache, Spark is born out of the Apache community and if you look at almost every vendor that does big data processing, they use Spark. Spark sits somewhere under the hood. Similarly, if you look at the Linux community like, you know, the OG is Linux itself. Have you ever, have you ever done a git profile of the Linux, you know, contributions and Linux git? Well it's actually invented by Linus Torvald because he's a git. But the, the profile is mad. Like there are so many contributors, there are so many people like doing that, but there's still a governance over it so it's not like a free for all. So open source as a community led project and there's different variants of open source. Like you have stadiums, I call it stadiums because it's like small contribution base, large usage, that's not a good healthy community because you don't end up with a two sided network effect. But when you have a large contribution base and a good community and a lot of users then it starts to become a standard and it might have competition for the standard, for the standard, but having two or three is better than having 500 proprietary things. If you flip sort of government mandated, that's not the worst idea in the world, you know, but the problem is we have like what, 200 odd governments. So you know, that's, I mean I'm in the UK, we left the EU, that was 27 countries that couldn't really agree on many things. That's one of the main reasons. So without getting political it's really hard to get those communities to rally together. So government's tricky. I actually think open source has a means of surpassing government because people join and actually invest a huge amount of their time contributing to these things and that's why these big complex problems are well served by open source. And it's why we see a lot of standards forming and rallying around open sports source despite there not being a kind of government mandate to do so. One of the best examples ever is the Internet, right? But that was actually a government project linked with an open source foundation. So I think it's, that was a really strong combo because it was both.

Juan Sequeda [00:22:07]:
So one of the fascinating things I'm getting out of this conversation I never realized what you articulated is that going up the stack right? So we have a lot of standards. Look at the stack going from the bottom right from Linux and going up and up that has been standardized. But the more you go up there's less standardization. And I think the higher, I mean the, let's talk about the lower is going to be literally the bits and the compute and the computers. And then the more you go up in the stack we get into more of the human aspects and people. That's how I'm seeing this right now. And this is a great realization to say like why are the, why are the standards so hard to go do at a higher level of the stack? Goes back to our earlier rant on our jokes and stuff. It's because there's people, right, and people's are complicated. And you can argue that I think there's less things that we need to kind of agree upon. Maybe when you're lowering the stack and maybe if we do have some competition about this, you could actually make some very scientific evidence of like, oh, it's better because of this. Like look at these experiments. Like this is a better thing to go do it this way. Let's go do it this way, period. And it's kind of hard to kind of go against that. But once you go into more like subjectivity in a way, right? And then it's like, well, then it's, everybody starts competing on my way and my way, and then, then it goes more into what you said before. It's like the, you argue that the semantics of context is proprietary. So maybe I'm asking myself now is this is kind of my existential question, how is like, how much are we focused on doing on focused in an area that is doomed to never succeed? Because at a higher level, it's like maybe at the semantics level you have so many people, it's like, it's not, this is not going to happen, so just let it be and we will, we will. You don't have to, we don't have to. Like we can exchange the, the parquet files, but we still need to go do the transforms of etls because is.

Tim Gasper [00:24:10]:
This a successive, like an endless task that we'll never solve?

Dael Williamson [00:24:14]:
Right, well, well, okay, so you've got to kind of deconstruct. Why do people have a genuine problem with understanding? Like, I have laughed my entire career at the misunderstandings that exist in everyday professional life and there's a lot of reasons for it. So, like, I had this really fun experience once. I was sitting on a plane to Doha and I was sitting next to a guy. And I was writing a, you know, paper because I was thinking of like a talk, and it was on navigating organizational Game of Thrones. And it had a lot to do with kind of organizational politics and, you know, these kingdoms and fiefdoms and how everything starts to prevail. The guy sitting next to me was an organizational therapist. Okay? So he starts laughing. I look across, I'm like, why is this funny? He says, in organizational therapist. I was like, closed laptop. We spoke for about 5 hours. And one of the things he explained to me is he said, a lot of it's got to do with language, it's got to do with how decisions are made, and it's got to do with power. So if you like decision science, and then ultimately a very under thought through area, which is like choice science, so we're going to make a decision, and then we're going to be presented with the information and given a choice. It is a really old discipline, but it's one we don't typically examine when we're talking about kind of intelligence. And generally it's managed through a lot of communication back and forth. So we see this today in kind of companies. So he said, like, one of the things you can look at is how does a company make decisions? So, like, do they go sideways before they go up? Typically that represents a consensus culture and a risk culture. They have a low risk tolerance because that's why they go up. And they have to get consensus before they go up, because nobody's, everyone's covering their ass, right? So that politics is a really interesting problem, and a lot of it has to do with language. A lot of it has to do with misunderstandings. Take a simple word in our domain project. In a data science context, a project means something very different to somebody higher up in the executive board room. They are saying the same word, they're meaning very different things. So that brings us to the sort of holy grail context. Like, how many meetings have you been on where somebody goes just to give you context? Like, for the next month, I want you to sit down, and every time you hear it like a little note, you will hear that phrase so often because people believe they have to explain before they get to their point. Now, imagine you had a superpower that understood language and could be trained on very specific context, and this just happened 18 months ago or nearly two years ago. So this is the power of these little language models. They're amazing. They're a superpower, and that is where they play a role. So how do they change? How do they help to reduce the friction and kind of personalize language? I've got lots of questions I'm going to ask this thing rather than, you know, it peppering people. And I think that's where there's a missing piece of the stack. It's the translation of context. It's that adaptive piece. You know, we all have gray matter sitting between our ears and our mouth, and we'll listen and process and then speak, and we don't have that naturally built into the stack. And for me, that's been one of the biggest wow moments is. So I'll give you a practical example because I can see questions coming. So one of the things we tried to do as an engineering team internally was we have like 1500 people in engineering that's a lot of alignment to build a platform that's integrated and kind of works for everyone across the world. And that has to be deployed to lots of different countries, lots of different data centers. So they have to do a lot of alignment. Now, in a traditional context, they would have to meet all the time. Imagine they could write everything that they did and feed it into a model, and then the other teams could just ask the model. And that model has the ability to, you know, be quite accurate in its responses, but play the role of the interpreter for that other person. And you got high context people and low context people. So people can ask big questions, people can ask little questions. And that's, I think, the game change that I think has happened. And if we can make those interoperable again, and that's why I love the open source model movement, we can then create these, these things that stop the humans from being so confused half the time and needing to spend lots of time with each other to get on the same page.

Juan Sequeda [00:29:46]:
Oh, my God. Okay, this is a, this is fantastic. Okay, so first thing aha moment I'm having right now, connecting dots here, what you said is, I love this whole translation of the context, is that that's why we have so many meetings and people are afraid of what I need to go. There's my, we're using the same word, but I need to make sure that you understand my word. This way, that way, there's a context around that stuff. And then, so that's what you and I think, again, with these technologies, these LLMs and stuff, there's a way to go help us kind of do that translation of the context. But what I'm really taking away here is that going up that stack, when we get into that semantics part closer to the people, it gets harder. And your point how I'm interpreting is that it gets harder because we really have to make that context translation, and that is really hard to go do. It's very manual to go do. It takes some time. We got some technologies now can help us kind of accelerate that. So one part of my life that I've been working on so much is this, what I've been calling the knowledge science. The knowledge engineering we talk about data engineering is now in the stack part. It's like, it's the bottom part. It's, how do I move the data this way and that way? Make sure the pipelines flow and all that stuff. But the moment that you start hitting kind of that line, I'm like, well, this isn't just about moving the bits and stuff in a scalable way. It's like, and where is what context starts evolving? That's the switch where it goes from data engineering to what I, what is what I call the knowledge engineering, the knowledge science. And that's where we need to start talking to more people and so forth. But, so, but, but then when I bring this up to folks, they're, they're usually. But that's a unicorn. You want somebody who can talk to the business and you can talk to this and that, you're like, and part of me is like, yeah, maybe. I mean, I know those people. I consider myself one of those people and stuff. And, yeah, we're married. Not, there's not many, many more out there, but I think that that's something that's missing and argue that we're not even doing the training for that in university and stuff. But this, what you're saying is kind of, wait, we have this new technology now that can actually help accelerate and make that change, and just us to be a little bit more socratic. I mean, like, oh, somebody asked us questions. Like, hold on, don't just take that for face. Not like, no, go ask more questions as to why and why. Is that what we mean stuff? So, okay, I'm going on my, I'm going on. Let me stop my rant. I don't know. Does that make sense?

Dael Williamson [00:32:10]:
I clearly sent him off on one, but you're spot on. Your knowledge science context, your knowledge engineering context, it lacked a computer. So if you look at the computers we use today, it's very kind of von Neumann, it's very deterministic, whereas our brains are quite probabilistic in how we do anything. So any problem in the world has kind of duality between probabilistic and deterministic. Take Uber, for example. When you go, okay, I want to go from here to, I'm in London, so I want to go to like King's Cross. So I want to take an Uber there. If I take that Uber now at like 941 at night, it's going to go in a fairly linear way there. So that's more deterministic than it is probabilistic. But the minute I do it during rush hour, there's traffic. So Uber has been designed to have these models that figure out the best path to go with the least congestion. And that's not going to be the same as the route I go in off peak time. So what that shows you is that that is a deterministic solution mixed with a probabilistic solution. Now that is more like how humans operate. And so what these language models, and actually any form of AI model, like if you put them into a more compound system, because you need to put them in more compound systems to get the accuracy up and to get the cost down. And a compound system is more like kind of the workflow that it fits within. That approach will give you far more of that computer that you're missing for context translation. And it's got to be domain specific, so it's got to be company specific. Like, we've talked a lot about this, like for years, industry models or domain models and stuff like that. And we're trying to fix something to like here, a company buy this industry model that we will fit your company into. But, but it's like, it's an asymmetrical, you know, business fitting into this deterministic model. It's too fixed, it's too forced that.

Juan Sequeda [00:34:41]:
These industry models should, we should not be having these industry models.

Dael Williamson [00:34:45]:
They're great for a base, they're great for a template, but they will evolve because, so let's say you're a wealth management company and you operate in like 15 markets, and then you take that model and lift it and give it to a wealth management company that operates in three markets. But they're totally three different markets. They're not the same model. And how do we. But they could have the same base. You could give them the base schema and then let that model evolve. And that's the beauty of like language models is they can create that evolution and then propagate. What, what does that model look like today? It's kind of like tear down restore of the graph. And that teardown and restore could be daily. It could be monthly, you know, depending on like the dynamic nature of the, of the, of the system. Like a logistics company would be far more dynamic than say, you know, a mortgage business, like it's one is way more static than the other, one is more higher propensity to change. I'm not being mean to mortgage people, but, you know, it's not a highly, you know, fluid environment. So, but you get my point. Like, it's an incredibly, it gives you that computer to tweak these things and to adjust these things and to store them. Actually.

Tim Gasper [00:36:12]:
This gives a lot to think about here. I think this is interesting, Juan. I'm going to, I'm going to take things in a slightly different direction. I want to make sure, see if you have any follow up questions before I go in that direction.

Juan Sequeda [00:36:24]:
No, no, you go, Tim, because you go. All right, we got our back channel going on here.

Tim Gasper [00:36:30]:
Ok, so Dael. And if we want to come back to the direction you're going here, we can. But I want to come back to interoperability for a second because I think that, you know, so the kind of, the premise of our chat today is around like data and AI. And, you know, how do we, you know, how are we going to establish interoperability around this, right? And, you know, we could end up in a world where our data is highly interoperable, right? Let's say, you know, things like Parquet and iceberg and things like that. Like, I think we're making progress. It's probably not as fast as we want, but we're making progress. Progress. We could end up in a world where our AI is interoperable in the sense that we've got APIs, we've got tools, we've got tool chains, we've got link chain, we've got this open source community that's swelling there. And then on top of that, a lot of these large language models and things like that, they're using language. And language is by far not a perfect means of communicating context, but it's something. And so it does create some kind of interoperability where these things can literally talk to each other, the robots are talking to. But we might live in a world where data is interoperable, AI is interoperable, and yet context still isn't. And is that okay? Is that not okay? How can we avoid that? How should we avoid that?

Juan Sequeda [00:37:56]:
To add to this, this is one of the thought experiments that my friend colleague Ora Lassila wrote a book together on design knowledge, graphs. He says, imagine you put the all the data about your company on a USB stick and you stick it to the ground, right? CSV file, stick it into our excel, stick it into the ground. Hundred years later somebody finds it and interoperability, they'll be able to put in the USB, they'll be able to open up a CSV file, open up the explorer, they'll see the data. But will they understand what the business did? Will they understand how the business made money to understand what's the state of the business? So, and I think to Tim's point, there's like we'll have all this interoperability, right? But to that, is that semantics, that context, is that something that will continue to be interoperable or that's kind of where I'm seeing and my, my goal is like, we want to, we need to get there. And I wonder how much people are actually see that as like the challenge that we need to go.

Dael Williamson [00:38:56]:
Well, context, context isn't always interoperable. So you're right, it's a bit like, I'll use an interesting analogy, traffic, right? We all follow the rules of the road. Like I drive on one side of the road, you drive on the other side of the road. But when I'm in the US, it's not really difficult for me to get in a car, rent a car, and then just quickly, you know, make a subtle tweak in my brain to not drive on the side that I'm used to and not get in the side of the car that I'm used to, that is about as interoperable and it's universal. It's quite amazing, actually. I'm south african and every now and then I go back to South Africa and they have a lot of energy cuts and things. Do you know that there's less traffic when the lights are off than when the lights are on? And it's because everyone knows the rule of the road, so they just default to four way stops instead of traffic lights, they default to traffic circles. They all know the rules. So we don't need to solve context, we just need some rules that govern context. The same way that we all publish websites. I mean, Tim, you hit the nail on the head. We all publish websites. Why? Because there is a mock up language that the world has somehow got behind and it makes things interoperable. Email is very interoperable because of SMTP. So we've got it right not to solve context, but to manage context with some very basic markup rules. So I'm not suggesting that we can fix this because it's a complex problem, but I am suggesting that there are ways to manage this better. And wouldn't it be more fun to solve sustainability and to solve cancer? Please. You get where I'm going with this. Like, it's not that this will make us redundant.

Juan Sequeda [00:41:20]:
No, no. This. This is 100. And I love, we have to go to our lighting round right now, but I love how we're closing on this. And just a couple, couple of quick thoughts here. Is that again for me? I come from the web world. Right? The web, right in, that's my web scientific community, everything. And it's just the most powerful system that was. Yeah, it was invented by one person, Tim, but who brought all this stuff together, but made something saying, this is the genius of what it is. Like, I'm gonna make the stuff that make it interoperable. And then, yeah, this thing just exploded. And he didn't mandate how to go, like, the website itself as. I don't know. Here's the language, the art, the correct markup type.

Dael Williamson [00:41:59]:
Right. I think rules of the road.

Juan Sequeda [00:42:01]:
The rules of the road. I mean, so kind of the analogy here is like, come from the semantic web community. So that's why we have things like RDF and Owl and sparkle as the rules of the road. We're not mandating that. This is the semantic ontology layer that you use for this particular insurance industry. Some people do try to go do that, and then I think that's where we're like, don't try to control it. Use it as a base. Right. But I think that's super important. This is the t shirt that I have here for you now is we don't need to solve context. We need rules that governed context. That's your, that's your t shirt quote right there. All right, Tim, lightning round questions. Oh, man, this is fantastic discussion.

Tim Gasper [00:42:43]:
Yes.

Juan Sequeda [00:42:44]:
All right.

Tim Gasper [00:42:45]:
I feel like we're just starting to crack the kernel of this here, but we might have to save it for part two.

Juan Sequeda [00:42:50]:
Yeah. All right, let me kick this off.

Tim Gasper [00:42:53]:
Yeah, go ahead.

Juan Sequeda [00:42:54]:
First question. So will the data community create a groundswell in terms of adoption around a standard for semantics, let's say in the next five years?

Dael Williamson [00:43:03]:
We're starting to see it happen, right? So just basic stuff like the ANSI SQL community are creating measures. We're creating metrics in our unity catalog. So slowly but surely, there's little pieces that are coming out of the proprietary monster. There's semantic markup languages being kind of open sourced. There was one today at scale. So, like, really great contribution that they made. There's the work you guys are doing, and a lot of that is based on open standards. So I think it's possible. The problem is that we need enough community to rally behind it. So, yes, I think it will happen. I also think that language models mixed with this is the superpower. So I think that's the change that has to happen. Your knowledge, science needs a sort of LLM degree.

Tim Gasper [00:44:04]:
Yeah. Yeah. I think there's an interesting interplay here that we're going to be able to unlock as we go forward. All right, second lightning round question. ETL tools, bi tools, data warehouses, semantic layers. Right? There's all these different pieces of the pie, and each one of them has some kind of a stake or they wish they did. Right around context and where the context lives. Do you see one of these places where context will most gravitate?

Dael Williamson [00:44:35]:
So I worked a lot with sort of graphs and with generalized data platforms, and I actually think the key is to split the heavy stuff, you know, the data from the metadata. So there has to be an abstraction between those two things. And the reason I say that is that is how our bodies work. So I'm going to go back to biomimicry. So DNA is like metadata. It's the context of our bodies. And proteins, which are produced as a consequence of that, are like the data. They're heavier, they have more mass, but they're also made of interoperable building blocks, amino acids, which is why the food supply chain works. You know, we can eat animals and animals can eat plants, and, you know, we can then degrade into. Into the soil.

Tim Gasper [00:45:33]:
Yeah, I love the mention of biomimicry. I think there's a lot about how our bodies work, not just like proteins and things like that, but also our brains that I think we're gonna. They're gonna lock so much around data and around AI, and we're just at the beginning.

Dael Williamson [00:45:45]:
Exactly.

Juan Sequeda [00:45:47]:
All right, next question. So will enterprises all take their context and train and LLM, or will it be more of a separate layer in the interact and the LL interact with it?

Dael Williamson [00:45:58]:
So I've spent significant time in enterprises, and they all have their own language. You know, we have our language. We have a lot of three letter acronyms. Every company has their own language. So there is very specific, you know, enterprise domain context that your stack needs to understand about you. But if you're going to do streamline business with customers and suppliers and partners, there has to be a point of context interoperability and data interoperability between you. So smooth flow across the value chain will require, like, it's a bit like how APIs were sort of designed to create links between different companies, but it has to happen in a bit of a sort of thicker communication and language way. So the answer is yes, but it's more complicated than that.

Tim Gasper [00:47:03]:
So it's kind of yes. Both. Okay, last lightning round question for you. This is a little bit taking us on a different tangent, but it follows around open source. One thing that I've noticed is that open source has struggled to win in certain parts of the stack. One of those areas is business intelligence. Do you believe that it will win in business intelligence ultimately?

Dael Williamson [00:47:30]:
Yes, I do. And I think open source language models is the key because one of the most proprietary layers is the business intelligence layer. It is also the most annoying because go ask. Like, what I love about the language model movement is we're starting to prompt, we're starting to ask questions. Come up with a list of questions. You want to know about your business and go to your business intelligence department and ask those questions. You're not going to get the answer easily. And emphasis on the word easily.

Tim Gasper [00:48:09]:
Yeah, well stated.

Juan Sequeda [00:48:13]:
All right, we have so many notes, I think we need to do a part two on this. All right. There's still so much stuff. Tim, take us away. Takeaways. Go ahead.

Tim Gasper [00:48:21]:
All right, takeaways. So we started off with interoperability. Like, what does that mean in the context of data and AI? And why is it so important? Right? And you started off, Dael, by saying, why do we ETL? It's because of the lack of interoperability, right? We're spending all this time moving bits around from one place to another. Transform, transform, transform. And it's because these things don't operate well together, right? And imagine, what if we could have free exchange of goods? What if we could have a lack of this time tax man? We could do things like solve cancer by now, right? And shouldn't we be spending our time and energy on this? And I think, you know, Juan, you used a great analogy, which I know you've used a few times on this show of, you know, the, we can land a rocket upright, right, on a boat in the middle of the ocean, but we can't tell if these spreadsheets match. And, you know, Dael, a few times today, you use some great sort of biology analogies, right? And, you know, we've got millions of blood cells every minute that are getting replicated with absolute precision, you know, and yeah, sometimes there's an error rate, but our body knows how to handle that. Usually it processes that and this is an insane amount of metadata that's happening here. And yet it's just all seamless. Meanwhile in our jobs we're copy pasting and doing all sorts of stuff, so hopefully we don't have to do that. And there are great examples of where interoperability has really flourished. The Apache community, the Linux community, these communities are really important. Important. And when there's lots of contributors, that's best, right. You mentioned stadiums, right? Few contributors, lots of users. That's an unstable model. Lots of contributors, lots of participation and strong governance. I think you even use that phrase, strong governance is important. Government mandates, maybe that can work in some cases, but it's these open models that really work the best. And then Juan, I'll pass it over to you. What were your big takeaways?

Juan Sequeda [00:50:14]:
The most important takeaway, which is the whole looking at the stack, right. So the lower parts of the stack have the most standardization. As you move up it gets harder because there's get into more of the people aspect, right? So then we have to ask ourselves why do, why are we having these misunderstandings with people, right? And then it really depends on the languages we understand how decisions are made, where is the power, right? Look at the decision and choice sciences, right? It's like, and there's all these disciplines that I think folks in our community like we're not looking into. So, and I think we should ask ourselves, how does a company make a decision? So do they like your example? They do, they go sideways and they go up. That means that they're very consensus. They're like they're covering their ass, like they need to be careful about that stuff. And that's going to make an implication of how we want to be able to kind of what the stuff means and how to get keep back on that context. It's important. Like we said, once we go up the stack, the translation of the context is what is what makes it harder to have standardization. So like how many memes do you have in a week, right, that it's all about I'm giving you that context or people want to explain you things, right? So that context translation is that misen piece in the stack. And I think I like what you said, like that translation pieces is missing some computation, some computer, and by the way, there we got these great LLMs machines now they can help us compute this. So like this knowledge engineering science that I've been pushing. Like the LLMs are here to go help. We don't need to solve context we need to get, we need rules that can help govern that context. And it's super important, quote, because just think about it. It's like the web works because we're not mandating people on how to establish a website, we're just giving you a markup language, right? So the goal here is not to solve context, but to really manage it. And at some point, like, people may be trying to do these, like, big industry models, which seems like you're trying to mandate context, but it really makes sure you're using those for like a basin or template. We got more. So much stuff. But how did we do anything we missed?

Dael Williamson [00:52:10]:
Ah, that was pretty good sum up. Did you, did you use a language model in the background to kind of summarize?

Juan Sequeda [00:52:16]:
Now we. I have one. It's called my brain and brain.

Tim Gasper [00:52:19]:
Yeah, the noggin model.

Juan Sequeda [00:52:23]:
All right, quickly throw it back to you. What's your advice? Who should we invite next? And what resources do you follow?

Dael Williamson [00:52:32]:
So my advice to those people in our domain and actually outside is start to interrogate. Like how you can sort of unlock context more in your organization. Like, what is the unlock for context? Where is it locked? Because there is so much value in that. And what's interesting about that is, like, so many people are closing off the data walls, right? That is such a rich form of data that no one even thinks of as data. The person that I think you should invite next is the person I quoted. And I don't know how you're going to get him on, but if you did, it would be like, gangster is Bill Gurley. So he's like, amazing. I listen to quite a lot of his talks and stuff, but it's, he belongs to, like, the Santa Fe institutes, and he's very much into the sort of complexity theory stuff. As a biochemist, I really appreciate that stuff. So I think he would be an insane guest. And resources I follow. I'm like a podcast ninja. I listen to so many things. I've been listening to you guys for quite a few years. I listened to loads of other podcasts. So, yeah, like, that's how I, if I'm trying to get in the zone, I will, I will find something and I will go for a run or go for a paddle or something. And that's how I get sort of my, my knowledge injection.

Juan Sequeda [00:54:01]:
Well, thank you, by the way. You mean like Bill Gurley, the, the venture capitalist investor, big girl.

Dael Williamson [00:54:07]:
Yeah, that one.

Juan Sequeda [00:54:08]:
Yeah. Oh, man, I listen to him all the time.

Dael Williamson [00:54:12]:
He, he's great. Like his, his stuff is amazing.

Tim Gasper [00:54:16]:
Bill, we're coming for you. We want you on the show, man.

Dael Williamson [00:54:18]:
I think, I think, I think in examining the data stack, and he's a big advocate for open source, so I think he would have some very fascinating perspective on where we're going as an industry.

Juan Sequeda [00:54:31]:
Well, Dael, this was phenomenal. And I look forward to actually revisiting this conversation in the future to see how we've been advancing, what we think, what we've seen. Dael, thank you so much. Cheers.

Dael Williamson [00:54:45]:
Thanks, guys.

Special guests

Avatar of Dael Williamson
Dael Williamson EMEA CTO, Databricks
chat with archie icon