Be the architect of your AI-driven future at our digital event "Blueprints for Generative AI."

NEW Tool:

Use generative AI to learn more about data.world

Product Launch:

data.world has officially leveled up its integration with Snowflake’s new data quality capabilities

PRODUCT LAUNCH:

data.world enables trusted conversations with your company’s data and knowledge with the AI Context Engine™

PRODUCT LAUNCH:

Accelerate adoption of AI with the AI Context Engine™️, now generally available

Upcoming Digital Event

Be the architect of your AI-driven future at "Blueprints for Generative AI." 

View all webinars

Live from Data Council Austin: what’s your honest no-bs take of the data world?

Clock Icon 45 minutes
Sparkle

About this episode

Tim and Juan will be attending Data Council Austin and will be live interviewing anyone who wants to be interviewed! One question: what’s your honest no-bs take of the data world?

Ben [00:00:05] Is this on the internet right now?

Tim [00:00:06] It is.

Speaker 3 [00:00:06] We are now live.

Tim [00:00:07] We are now live from-

Ben [00:00:09] Who's on the other end? How many people are possibly watching this?

Speaker 3 [00:00:15] I don't know. We'll see.

Tim [00:00:15] Right now, zero.

Speaker 3 [00:00:16] Because it doesn't show LinkedIn.

Tim [00:00:17] Oh, it doesn't show LinkedIn.

Speaker 3 [00:00:18] Hey, we're live.

Tim [00:00:18] It's Catalog& Cocktails live from Data Council, Austin.

Ben [00:00:22] I don't think there's booze in that.

Tim [00:00:23] There may not be booze in Ben's drink-

Ben [00:00:26] I think it's a inaudible-

Tim [00:00:26] ... andwe've got beers, which is a little counter to the Catalog& Cocktails moniker.

Speaker 3 [00:00:30] There's some food around here too.

Tim [00:00:34] But it's been a fun conference so far. We're in day two.

Speaker 3 [00:00:39] We're going to start bringing in people here. This is going to be a weird episode. We're going to just be pulling people in and asking them, " What's your honest, no BS take on the data world at Data Council? What did you learn? What are you annoyed about?" Ben, how's that?

Ben [00:00:58] I don't know what that is.

Tim [00:01:01] Is it cheesy enough?

Ben [00:01:02] I think it's a fried mac and cheese thing. It's not great.

Speaker 3 [00:01:10] What did you learn in the last two days here?

Ben [00:01:13] What did I learn? I learned that, in 2021, we thought we were all changing the world, man. Turns out we're just hawking software, and mostly hawking software to each other. There's no booze in that one yet.

Tim [00:01:39] At least the bubble's a little burst now. Maybe we're leveling out a little bit.

Ben [00:01:42] Yeah. This is when the real work begins. There's this thing that VC's say every time you raise money. It's the second most annoying thing. The first most annoying is when the economy goes to shit and they're like, " This is when companies are forged," which I've already said earlier in this conference. The second most annoying thing is you raise money and they're like, " Congratulations. We're so excited to have you as a partner," and all that. Then they're like, " Now the real work begins." You're like, one, " You asshole." Two, " They're kind of right." Raising money doesn't really count for anything. We raised all the money. We spent all the money. We have none of it left, but now the real work begins. We're kind of in debt.

Speaker 3 [00:02:21] What's the actual real work that needs to be done, then? Let me rephrase this. What is the work that we have been doing if it hasn't been real, then?

Ben [00:02:31] We've mostly been making swag, and hosting conferences.

Tim [00:02:36] Stickers are nice.

Ben [00:02:37] Yeah. There's some nice stickers.

Tim [00:02:38] Everybody likes stickers, and things like that.

Ben [00:02:40] Dlthub. I'm an investor in dlthub. That's a good sticker.

Tim [00:02:42] Okay.

Ben [00:02:43] Yeah. We'll promote that one.

Tim [00:02:45] We like this sticker.

Speaker 3 [00:02:46] All right, you're ... We got that. No BS. The non- sales line.

Ben [00:02:54] Buy dlthub. I don't know. Can you even buy it yet? I don't know.

Tim [00:02:59] Go to the website. Check it out.

Ben [00:03:01] I'm not an informed investor.

Speaker 3 [00:03:03] Not been working. We thought we were working.

Ben [00:03:05] We were working. We were working. There's a lot of stuff that got created that's like, it's noisy. It's what happens. It's how this stuff happens. It's progress. Progress is we all lurch forward and crazy things happen and we pick up the pieces and we keep going. And that's the optimistic stance that I probably should take.

Tim [00:03:26] Well, there's the AI thing now.

Ben [00:03:29] Heard of it.

Tim [00:03:29] Yeah. It may or may not save us all. I don't know.

Ben [00:03:33] I do think, I actually do think... I don't think AI will save us. I do think it'll be... It is an interesting discontinuity of sorts and we've been trying a bunch of things in a particular way. It hasn't necessarily all gone great. I think there have been things that have been better. There's a lot of workflows that are better. It's much easier to do a lot of things now than it was before. It's not that much more valuable perhaps, but it's easier. It's nice for people to do it. I do think the AI has the potential to change the way that we think about what data is for and how we actually approach it. That is not a step change in the sense of like, " Oh my God, now we have a magic pot that'll do everything for us." But a step change in the sense that sort of shuffles things up a lot, such that whatever comes out on the other side is suddenly a little bit of a different perspective on what we're trying to do with it. Or data or whatever. There's a couple of examples of this that are more specific.

Tim [00:04:26] That's a fair take.

Ben [00:04:28] I don't know if we have time for that on the Twitch stream.

Tim [00:04:31] On the Twitch stream.

Speaker 3 [00:04:32] Let me bring that on this side. We got Ernie, who's actually... Ernie was one of the very early Catalog& Cocktails guests before we officially started to have guests.

Ben [00:04:44] Oh, you have a proper cocktail.

Tim [00:04:47] Ernie was one of our early guests on the podcast. You've been in the data space for a very long time. You spent a good stint running product over at Manta. Now you're over at IBM, right?

Ernie Ostic [00:04:59] Yeah. So back at IBM, actually.

Tim [00:05:01] Back at IBM. That's true.

Ernie Ostic [00:05:03] Yeah. It was the second time. I've been through an acquisition there. I was with Data Stage originally with Essential in 2005. So then decided to leave.

Tim [00:05:15] The so- called legacy stack now?

Ernie Ostic [00:05:16] Yeah, exactly. But actually no. It's been reinvented on the cloud now and pretty exciting.

Tim [00:05:24] That's true.

Ernie Ostic [00:05:24] It brought back the brand from the Ashes and really saw its value in a hybrid cloud scenario. And so data stages back and going, which is great from my perspective. I was a product manager for it back in 2000, so 24 years ago. I left IBM because I felt like they weren't paying enough attention to Lineage. So I went to Manta because Manta was like an informal partner, but I left on good terms. It was all good. And then myself and others tried to work it out so that we had a partnership with IBM again, and I kind of thought maybe in the future that might be something that they'd want to pluck us and take us back. And I never thought it was going to happen as soon as it did, but in October I became part of the team again, and my mission now is to make sure that everybody on my former team is cooperating with everybody on the new family that I belong to. And so far it's going out really well. It's pretty exciting.

Speaker 3 [00:06:19] So you got decades of experience and knowledge in the space and now what have you learned in the last couple of days here that you're like, " Oh, this is actually different, better. Going somewhere." Or is this the same old, same old stuff? What's your honest no BS take here?

Ernie Ostic [00:06:34] Well, I think the first thing I would say is having done this ever since day one in my career, 42 years of just working with data. And always in this same area that at least we're all in terms of helping data to be more consumable, better understood whether it's Lineage or it's the wonderful glossary work and stuff that you guys do to help people understand it better. That the one thing that kills me is that 42 years ago, I was helping move data from MVS, COBOL files, moving it over to the user operating system at the time that was called CMS. And we've improved the technology GUI's, we see stuff on our phones now. We create charts. They're all pretty and we can do so much stuff with it. But the fundamental problem of having data that's not fit for user consumption is still exactly the same. We haven't gotten any better at doing transformations at some other place. You have nicer tools to transform it with. If you want to talk about DBT and all these great things now, and we went through Informatica and data stage and years ago we were doing fourth generation languages. It's the same damn problem. The data is not understood by the users. They don't know what it is. They don't know where it is. They don't know what it means. They can't find the people that really know about it and care about it without having to pick up the phone even with all the advancements that we've done. So the problem is still the same. And I think that's one piece that disappoints me a little bit.

Speaker 3 [00:08:02] But why? Why is it the same? Why do we still have these problems? I mean, this keeps me up at night.

Ben [00:08:12] Because nobody actually wants it? If you're trying to sell the same thing for 40 years and nobody buys it, at some point you got to wonder if you're selling something nobody wants.

Ernie Ostic [00:08:22] Well, I think there's been attempts over the years. Database is trying to put transformations deep inside the database. And then I will tell you this, subtle things have improved. No one messes around with how to format dates anymore. And back in the day, a couple of decades ago before we cared about representing dates as an integer from some base date, the nightmares that used to occur just because somebody wanted to represent something in a month day year versus year month day. I mean, it was just like that was manipulation. That was transformation to move bites around. That's a simple primitive example. At least we eliminated that problem, but still, we're still transforming data all over the place. It's probably stored a little bit nicer when it's in JSON or something, but I don't feel like we've solved that fundamental problem.

Speaker 3 [00:09:05] So 20 years from now, we're going to solve this problem? It's not going to get solved.

Ernie Ostic [00:09:09] Maybe. Maybe we'll have AI to do the transformations for us.

Tim [00:09:13] Yeah, I don't know. Maybe. I think what's interesting is the rate of change, right? Obviously the data space, the AI space is constantly evolving. And because of that, I feel like we're always playing catch up. And I think that's one of the hardest parts I know, especially for anything that's going to be in the metadata layer or things like that, is you're always like, "We got to integrate with more things." So I don't know. And I don't think that means it's inherently a bad thing. It just means that we're always playing a game of catch up.

Ernie Ostic [00:09:49] And the users still feel the same way. You go to every site now, and even though they're using fancy tools, they're still frustrated with not going to get what they need.

Speaker 3 [00:09:56] I guess this also become just a status quo. People are just used to it. That's the way how life is in period.

Ernie Ostic [00:10:03] Yeah, for sure.

Speaker 3 [00:10:05] We've even kind of depressed with this.

Ernie Ostic [00:10:08] We can keep trying to make fetch happen.

Speaker 3 [00:10:10] Yeah. So what should we be doing?

Ernie Ostic [00:10:16] I don't know. It's a good time.

Speaker 3 [00:10:22] All right. Well, did you learn anything new?

Ernie Ostic [00:10:28] At this conference? I'm only able to be here today, but I think seeing some of the cool stuff on Vectorization and Vector database and a couple of things like that, that was cool. There's some good new startups here. We're doing some creative things. I think that was really neat. I think learning a little bit more about orchestration as well. I mean, all I hear is airflow, airflow, airflow, airflow all the time. And you can compare it to other orchestration tools from the past, but it looks like there's some challengers now. Which I think is good to see. You need more competition in that space to kind of level set.

Tim [00:11:13] Dexter, et cetera.

Ernie Ostic [00:11:14] Yeah, I was very impressed with the presentation that Pete did from Dexter. So I got to look into that a little bit more. So I think that'll be one big takeaway from me.

Speaker 3 [00:11:23] I think compared to previous years and just in general, I think the AI has gone up and I feel BI is kind of like that one is going down.

Ernie Ostic [00:11:34] Was bi big at this conference?

Ben [00:11:36] No, it's booming always.

Speaker 3 [00:11:37] Always booming?

Ben [00:11:38] Got to buy BI.

Tim [00:11:40] It's always booming. I feel like there weren't a lot of explicit talks about BI because I think it's so much just embedded in everything.

Ben [00:11:50] It's just understood that you need it. Nobody was here selling laptops either.

Speaker 3 [00:11:58] That's a valid point. That's a valid point.

Ben [00:12:00] It's true. Got to have your charts man.

Tim [00:12:04] Dashboards.

Speaker 3 [00:12:05] Dashboards too. Conversational BI.

Tim [00:12:08] But now that mode is part of the Thought Spot. Are dashboards dead?

Speaker 3 [00:12:12] I thought dashboards are good. I'm just saying things that I read. No, they're being reinvented, man. It's all good. They're all good. They're all good.

Ernie Ostic [00:12:34] Dashboard mashup.

Speaker 3 [00:12:36] You need a hashboards.

Ernie Ostic [00:12:36] We have different names for it all the time. It's like how many different ways can we describe a place where you store data, where you're going to access it for decision support? It was an information warehouse in the late eighties and move to data warehouses and data marts, and now we have data lake houses. But really when it comes all down to it, it's still just the storage location for data that you want-

Speaker 3 [00:12:59] The quote from last week, the lake houses where your data goes on for vacation. That was from Scott last week. At the end of the day, I always say it's like you move data, you're storing compute data and you use data. That's it.

Ernie Ostic [00:13:15] And that problem is like what I said, it's exactly the same thing it was 40 years ago.

Tim [00:13:19] And BI falls into the users part, right? You got to use the data, you got to analyze it.

Ben [00:13:22] Most important thing.

Ernie Ostic [00:13:23] In all fairness, the technologies that have come along have been able to handle larger volumes for more efficient. You can't prove too on the technology, but the problem is still the same thing that we're doing.

Speaker 3 [00:13:34] All right. Well Tim, what have you learned?

Tim [00:13:41] Well, I think the talk that I enjoyed was from Drew, the CTO of Honey Hive. That was today. And he was talking about LLM's for evaluating other LM's, and that was a cool talk. I like that.

Ben [00:13:55] Go ahead.

Tim [00:13:59] That's good. And so I thought that was interesting. One of the reasons why I'm here is I want to understand the AI stack more and how all these things are fitting together. So that was an interesting thing for me. So different LLM's, an interesting thing was the biases that certain LLM's have biases. For example, GPT- 4 actually has a tendency to prefer its own responses, whereas if you feed it clawed and a llama and things like that, it tends to be more critical.

Ben [00:14:32] What if you lied to it and you say this is your response, which is not.

Tim [00:14:36] That's a good question.

Ben [00:14:37] What would it do then?

Speaker 3 [00:14:39] I mean also this morning, Joe Gonzalez from Berkeley, because I think the work from this lab, Berkeley was the one who started, we can use LM to do the comparison. And they realize like, " Oh, well if I say compare this with that, always the first thing you gave it is the thing. It kind of had more bias preference to." So when they actually started doing the analysis of the result of the thing they did, they realized, " Oh, we didn't do that well." So I think that's why people are realizing that we need to have different ways of evaluating this.

Ernie Ostic [00:15:14] So what did you like the most or get the most from so far? I think it's tomorrow too.

Speaker 3 [00:15:20] I mean, I'm with Ben. It's like we're seeing the same thing over and over again. And yeah, the same conversations about AI. I mean, I'm not leaving with something Wow, blown away. I think we're just continuing to hear the same thing that we were hearing on the LinkedIn's, on the blog, but I guess that's more validation I guess, what's coming up.

Ernie Ostic [00:15:53] Now if you guys done data council before? It seems like it's a very good place for vendors to certainly meet up. I don't know how many.

Speaker 3 [00:16:00] I think it's a good place for...

Ernie Ostic [00:16:01] Customers and decision makers come here.

Ben [00:16:03] Yeah, it's that. It's a party for the people who are in the industry.

Ernie Ostic [00:16:07] I don't know about party, we're all learning something, but I get your point. And a good place for startup vendors looks like...

Speaker 3 [00:16:15] Well, I think this is an opportunity for a lot of the startups who are just starting. That's how they kind of...

Ben [00:16:22] I'm going to go to this demo thing.

Ernie Ostic [00:16:25] And how much recruiting happens in these hallways, I wonder. Is that a good opportunity to...

Tim [00:16:31] Well, you've got VC's walking around, you've got startups walking around, people who are data engineers, data scientists looking for kind of where's the next spot they want to be at.

Ernie Ostic [00:16:41] Or vice versa. If you want to poach somebody.

Tim [00:16:43] I think of it kind of a cool kids in data kind of situation, which I enjoy, but it certainly is a specific thing, right?

Speaker 3 [00:16:51] Going back to my, I want to update my answer. I think yesterday there was a good data culture and people conversations. I think that's important stuff. People just understand. A lot of it kind of seems obvious, but then I realize it's not obvious. It's like, " Hey, how do you understand the business value so that you know what you're doing is kind of 101." But I realize that it's not 101 for a lot of people. So I'm actually very grateful and thankful to know that people are actually going to those types of conversations. And I think... I'd argue that the folks who are actually eager to learn more about the people, the culture, the business side, they're the next leaders.

Ernie Ostic [00:17:34] Makes sense.

Speaker 3 [00:17:36] They're the ones who are going to be connecting the dots all the way who'd be able to go talk, be technical and go all the way, talk right into the business. What's it called? This guy from AWS, Gregor Hohpe, he has this analogy going up the architect elevator. You should be able to go down to the engine room, go all the way to the penthouse to understand and navigate that. Not many people do that. I think that's a superpower. But I do genuinely believe that it's something you can learn, that's something you can go learn, and then you figure out where you feel more comfortable and who you partner more with and so forth.

Ernie Ostic [00:18:09] Now, the most successful data professionals I always thought were ones that may have had a programming background. They had some level of technology, they felt comfortable with it, but it wasn't their thing. They didn't want to dive in and just start coding job or something. So they gravitated through something that once they learn the business, they could do exactly what you're saying and bring the two together. And those have been the people that then became data governance professionals. I mean, you can take that wherever you want, but it's always somebody that can kind of put one foot in both camps.

Tim [00:18:40] Bridging the two sides.

Speaker 3 [00:18:42] So from a generational point of view, people are just starting their careers now. Is that continuing to happen? Or let's rephrase this. I sometimes feel that we're going to not see that anymore, that there's a new generation coming out of just new technical folks and they're either going into one side or the other, but that bridging part is missing.

Ernie Ostic [00:19:09] It could be, but you want to get me on a soapbox here?

Speaker 3 [00:19:12] Yes.

Ernie Ostic [00:19:15] I think one of the things that is good is that there are degrees now in things like data science.

Tim [00:19:23] That's true.

Ernie Ostic [00:19:23] So some places are training folks to be data people. Now how well they blossom in that and whether they go down a rat hole or not, it'll be interesting to see, but at least we're teaching people the value of data. And if that's got to be part of any curriculum, that's designed towards a data science oriented degree. At the same time, especially coming from an ETL background, it kills me to see that people are coding in solutions like Python that are interpreted code. It's the same as the dam interpreted code that I was dealing with in fourth generation languages in the eighties. Can't figure it out unless you want to go pouring through and reading through somebody's code. They can be as creative and free as they want to without any rigidity. And nothing comes out of it that tells you that it's metadata oriented. And that's what ETL tools really did. You dragged and dropped, you pointed and clicked. It built all this sequel behind the scenes. You didn't have to think about sequel. It just all took care of it. And now I see people pouring their life into Python and doing stuff in Spark and I have nothing against that, but it's just amazing that they just dropped all this tooling that did a lot of that work for you.

Speaker 3 [00:20:41] I mean, I wouldn't say, I have things against that. I think that's the wrong thing to do. Don't you think?

Ernie Ostic [00:20:47] Well, there's no code low- code tools that are surviving trying to make their case, inaudible and others. I suppose maybe DBT fits into that paradigm as well, but there's still a ton of people that are just saying, " Ah, open the code and just go for it." So somewhere there's a population in this younger generation that's coming into the world now that just says, "Mo, I just want to do it myself, the heck with it." And they're going to create code that two decades from now we're still going to be trying to decipher.

Tim [00:21:18] Yeah. So I think this is interesting because the modern data stack paradigm was very focused on EL, Extract Load into the data lake, and then of course DBT very much rose to prominence around the T. So now that it's in the data lake, I can go do T- T- T- T- T to it until I get into the shape that I want it to be in, right? We kind of threw away a lot of the ETL kind of approach.

Ernie Ostic [00:21:50] We threw the T.

Tim [00:21:51] People can complain about Informatica and things like that, but ultimately there was a paradigm that was established there around ETL to integrate data and transform it to get into shape-

Ernie Ostic [00:22:02] At the time though it was bashing its heads against ELT that was pushed by companies like Teradata, which basically said, " Load the garbage data into Teradata and then transform it inside of there because it'd be more efficient to do it there." So that was also a ELT approach as opposed to an ETL approach. And those two used to clash with each other all the time.

Tim [00:22:22] So have we learned anything from all this? What's the takeaway?

Ernie Ostic [00:22:28] That's why I'm-

Tim [00:22:28] Integration sucks. I don't know, right?

Ernie Ostic [00:22:30] I'll tell you a story. Let me tell you a story.

Tim [00:22:30] Yeah.

Ernie Ostic [00:22:30] So I'm going way back now, but I did a presentation where I had actually constructed from the fourth generation language that I grew up on, which was focus. That's where I cut my teeth on.

Tim [00:22:47] I'm actually not familiar with focus.

Ernie Ostic [00:22:48] Information Builders.

Tim [00:22:50] Oh, information builders. Yeah.

Ernie Ostic [00:22:52] Awesome tool. But I built the sophisticated report that actually generated a calendar from a database. It was at the time in 1988, it was pretty cool. But I finished this whole presentation and I got a comment, back then it was written comments. There's nothing on their phones. They actually had to write out what they felt about your presentation. And it was from a guy at the time who is clearly my age now, who said, " Amazing presentation, Ernie, but we're still going to be deciphering in 15 years, 20 years, all the code that you're building now and how the heck you actually pulled it off." And so it makes me laugh because that was a 100% runtime, parsed language, no compilation, you go on the fly. And that's clearly where I see the attraction to Python now. You get instant results because you can just sit there and just code and go away.

Tim [00:23:54] Yeah. See what it looks like.

Ernie Ostic [00:23:56] But people are writing code that is not maintainable worries me.

Speaker 3 [00:24:02] Okay, so write the letter for your old self, right? Write the letter for the junior kind of data professional now, what would you tell them?

Ernie Ostic [00:24:20] I would probably want to bring in a lot of the things that Pete from Daxter talked about, is that you've got to be conscious of the assets that you're working with. And so inside of your workflow and inside of your transformations, give respect to the assets and the metadata that they produce, and who they belong to and what they mean, what they own. Stuff that's your domain in the catalog while you're trying to code these transformations.

Speaker 3 [00:24:50] I think the key word that was respect, and I think that's what's missing, is that I should have respect. I am trying to interpret what this means, and I should have respect for what that meaning is, and I should have respect for the people who are actually going to make use of the stuff that I'm writing right now. And I think that is a sort of empathy also. My argument is that we don't treat the semantics, the metadata knowledge, with the respect it deserves.

Ernie Ostic [00:25:20] It's just a dumb ass string sitting in JSON. Nobody cares.

Speaker 3 [00:25:22] Exactly, " No, I'm just going to write this code."

Tim [00:25:25] Another thing I think about though, which is connected to respect is a code of conduct.

Ernie Ostic [00:25:31] I like this.

Tim [00:25:32] And this is where I... So we know the guys over at Data Kitchen, for example, around the whole data ops manifesto and all that, right?

Ernie Ostic [00:25:40] That infinite diagram.

Tim [00:25:42] Yeah. So I mean that's good stuff, right? I mean that kind of thing exists on software. We all know what agile means in software, but we don't have that similar level of respect and thoughtfulness on the data side. And I'm not going to say that categorically about everyone, but it's not at the same level. And so people aren't thinking about how they're going to do the right things to balance resilience and future proofing with, " Let me link some Python and just get from A to B." Right?

Ernie Ostic [00:26:15] And for what it's worth, it's also the pressures that's always been on the business. It's a whole lot cheaper-

Speaker 3 [00:26:21] Well, I think this goes back to what I always say, it's about efficiency and resilience and being respectful and empathetic is not always the fasting to go new.

Ernie Ostic [00:26:29] And that's been true forever.

Speaker 3 [00:26:29] I need to get done. So I'm just going to do it fast. So sorry I'm barging through. Hey, we got more folks who want to come in. You can stick around.

Ernie Ostic [00:26:40] I can stick around over here so we can get somebody else in the camera.

Speaker 3 [00:26:47] Hey, Reza.

Reza [00:26:47] Hey, good to meet you.

Tim [00:26:50] Nice to meet you. How would you like to give a quick takeaway?

Speaker 3 [00:26:54] We're here live.

Reza [00:26:55] Doing hot takes?

Tim [00:26:55] Cool. Live with catalog and cocktails. Reza, talk about where you're from and how's your conference been for you?

Speaker 3 [00:27:04] How's the honest no BS take right now?

Reza [00:27:08] On everything. Okay. Yeah. So my name is Reza Puri. I'm CEO and founder of Product bot AI. Here today at the data council, and just got in a little bit late today, so I don't have a hot take on this in particularly just yet.

Speaker 3 [00:27:26] In general, what's your honest, there'll be a sake of the data world right now.

Reza [00:27:29] Of the data world? I think we're at this critical juncture between how generative AI uses data and uses it effectively to get high quality results. And so what I've been seeing over the past year running a gen AI startup is a lot of companies, they want to use their data, but they don't want to put their data into the AI systems yet. There's an issue of data privacy and data leakage and who controls the data. And so I've seen these closed box models like OpenAI getting a lot of flack. " We do not want to use OpenAI, this is our agreement with you. Sorry, use something else." And so these black box models are catching a lot of flack because they're not open and they're not transparent. And I think openness and flexibility of where your AI gets deployed helps the data get unlocked. Yeah, that's my hot take. I don't know.

Tim [00:28:36] I like it. Did you learn anything interesting in the conference that you didn't know before you came in?

Reza [00:28:40] I got here super late today.

Tim [00:28:42] Oh, did you? Okay.

Reza [00:28:43] I came in last minute coming in hot from a couple of meetings.

Tim [00:28:47] That's cool. Well curious to hear what you find out. I feel like at Data Council it's about the parties afterwards, so the happy hour of the parties, that's where you're going to figure out some new stuff.

Ben [00:28:56] We got another one, inaudible Come on in.

Tim [00:28:58] Yeah, inaudible.

Ernie Ostic [00:29:00] I have to get going.

Speaker 3 [00:29:01] See you man.

Tim [00:29:03] Cheers.

Speaker 3 [00:29:04] We're live right now. What's your honest no BS take on what you've learned here at Data Council.

Tim [00:29:10] Data Council and then the data world in general.

Speaker 6 [00:29:13] Oh my God.

Speaker 3 [00:29:13] Honest no BS, non- salesy take.

Speaker 6 [00:29:17] Let's see. Okay, so catalogs and cocktails, non- alcoholic cocktail.

Tim [00:29:23] Yeah.

Speaker 3 [00:29:25] We got a beer. So no cocktail.

Tim [00:29:26] All good.

Speaker 6 [00:29:27] All good. My biggest hot take from this conference, I saw a talk on AB testing and one of my biggest, probably controversial opinions that I've literally never shared anywhere until just now on this is that AB testing oftentimes doesn't work for business cases in the way that we would like it to. A lot of the core assumptions on experimentation of been cargo cultured from academia into business where it's not as true, and listening to the confidence of some of the people who are super pro AB testing, I don't think it's justified. That's my super hottest take that I have that I've literally never shared before.

Speaker 3 [00:30:19] Give an example on... People are saying you should do AB testing on this particular scenario.

Tim [00:30:24] But here's how it actually plays out or whatever.

Speaker 6 [00:30:27] I mean, just to start, there are some really flawed core assumptions, like the 95% confidence interval for business decision making. Business decisions are not made with nearly 95% confidence in almost any case. For scientific testing, you're trying to find truths that people can build on top of in a peer reference way that's not business decision making. I mean listen, if you're Amazon and you want to be sure that nothing that you do accidentally kinks revenue by 20%, then yes, you want to stay mostly static and only make a change when you're absolutely sure that you can't mess anything up. For most other companies, for most other use cases, that's not your situation. And AB testing should be seen more as a way to keep things static without ever making a mistake versus... So if you have that business, great, but if you don't have that business, then you don't have that business.

Tim [00:31:34] So this is an interesting topic here, so I want to even double click one more click here, which is that why do you think this is? Is it because of instrumentation? For example, if you're an e- commerce company or Amazon, you can instrument more detailed or is it around things like the difference between trying to optimize a local maxima versus a more interesting global kind of point that you're trying to jump to? Or what's your take on why it doesn't work?

Speaker 6 [00:32:00] It's so many things that are wrong.

Tim [00:32:03] Its many. It's plethora.

Speaker 6 [00:32:05] The 95% confidence interval.

Speaker 3 [00:32:11] I like this.

Speaker 6 [00:32:11] This is a real hot take. This is a real hot take.

Tim [00:32:15] But I think it's a practical point. I mean it is not to say don't do AB testing because there's a lot of places where it's very useful, but it also explains why a lot of companies aren't doing a ton of AB testing or A, B, C or whatever kind of testing you want to do on various decisions and various things because it's maybe in many cases it's not that you're bad because you're not doing it. It's like, " Oh, it's actually not applicable."

Speaker 6 [00:32:39] Yeah. And even when it is applied, it's oftentimes applied incorrectly.

Speaker 3 [00:32:45] Yeah, I get the assumption. So that was a good one. Thanks.

Speaker 6 [00:32:51] inaudible.

Speaker 3 [00:32:50] We got somebody else.

Tim [00:32:52] I appreciate it.

Speaker 3 [00:32:53] Julian, want to join us?

Julian [00:32:55] Sure, what are we talking about?

Tim [00:32:56] Yeah, come on in.

Speaker 3 [00:32:57] Come on in. We're here.

Julian [00:32:58] Hello inaudible.

Tim [00:33:01] Welcome to Catalog and Cocktails.

Julian [00:33:03] Are we live?

Speaker 3 [00:33:04] We're live.

Tim [00:33:04] We are live.

Speaker 3 [00:33:05] Live over half an hour now.

Tim [00:33:06] So we're here at the data council in Austin.

Julian [00:33:08] Hello.

Tim [00:33:10] Great talks today.

Julian [00:33:11] Thank you.

Speaker 3 [00:33:13] What's your honest no BS take on what you've learned here at Data Council and in general, in the data world right now?

Julian [00:33:19] I mean my favorite part is the hallway track. You catch up with people like with John, say hi John. With Wes, with other people. And so I've been busy today because I was track host and I spoke and I was monitoring a panel, so I've been all over the place. So I was just distressed and all of that and just getting on the relaxing now. It's been great. A lot of good people, good discussions. It's been great. So on the data engineering track, which is the one I focused on, it's been going to see more talk about data fusion, arrow, talk about orchestration, talk about lineage, talk about various things like that. And it's been great. Oh, I met Andrew Lamb in person. That was great. I think we had interacted before on social media and stuff, but it's the first time we met in person and we actually reconnected. When I started Parquet, I had read this Vertica paper, there's a Vertica seven years later paper. And he actually is the main author on the paper, which I didn't realize and inaudible so there's lots of influence, old post directions.

Tim [00:34:45] I feel like this conference is always bringing the data kind of influencers and ground shakers, ground breaker kind of type of people. Let's bring them all together.

Speaker 3 [00:34:59] Come on over.

Julian [00:35:00] Decision of things, open source things that happening.

Speaker 8 [00:35:05] What am I joining here?

Speaker 3 [00:35:07] We're live right now.

Tim [00:35:08] Welcome to Catalog and Cocktails.

Speaker 8 [00:35:11] I sincerely did not mean to do this.

Speaker 3 [00:35:14] What's your honest no BS take on what you've learned in the conference, just in the data world in general?

Speaker 8 [00:35:21] Wow. That I don't want to do too much AI and I want to keep focusing on data.

Speaker 3 [00:35:28] That's an honest no BS take right there.

Speaker 8 [00:35:31] That's my honest...

Julian [00:35:37] And popular opinion.

Speaker 8 [00:35:37] I don't care.

Tim [00:35:37] I'm sure there's some people who are listening right now who are like, " You know what? This AI thing's kind of weird. I'm not sure I'm into it."

Speaker 8 [00:35:41] At least a new number is going up.

Speaker 3 [00:35:43] Oh yeah. This is not even counting the LinkedIn well listeners and stuff. So why? AI, just you're annoyed?

Speaker 8 [00:35:49] No, I'm not annoyed. I just really love data. I really love data infrastructure and it's just always been the thing that it makes me happy. So I'm going to keep doing that.

Speaker 3 [00:35:59] So you don't do AI infrastructure?

Speaker 8 [00:36:02] What's the difference?

Julian [00:36:03] Exact same thing.

Tim [00:36:03] You may get drafted into it automatically.

Speaker 8 [00:36:07] I guess I am. That's okay. But not going to call it what it is.

Tim [00:36:11] No, that's good. I did appreciate that. Of all the tracks, only one of them was generative AI track. I was worried that maybe it was going to dominate the conference, but actually I think that it's been balanced.

Speaker 8 [00:36:21] In our data line panel. No one said AI. I think maybe Sherry, she brought up the AI up at the end.

Julian [00:36:27] We should have kicked her out.

Speaker 8 [00:36:30] Exactly.

Speaker 3 [00:36:31] But it was a take on you can't expect the lineage to be done by the AI. That's why you actually need the real lineage.

Speaker 8 [00:36:42] I guess we could try, yeah. Yes.

Tim [00:36:46] That's a good point.

Speaker 3 [00:36:48] Long-lived data AI.

Julian [00:36:49] Long live dat, that's all I'm going to say. Not going to listen to anyone. We have to go.

Speaker 3 [00:36:59] You guys have to go.

Speaker 8 [00:37:00] Thank you for listening everyone. Follow me on Twitter.

Julian [00:37:04] Don't follow me on Twitter. Follow him.

Speaker 8 [00:37:05] Hit the subscribe button.

Speaker 3 [00:37:08] Cheers guys. Thanks for playing along. Well Tim?

Tim [00:37:13] I mean, it's always fun getting to meet up with friends, with colleagues, with contacts.

Speaker 3 [00:37:22] There's people over there walking by. Ryan? All right. I'm just trying to follow people to get in here. We're live here.

Ryan [00:37:33] Oh, you are?

Speaker 3 [00:37:35] Come over here.

Tim [00:37:37] Welcome to Catalog and Cocktails. We're streaming live.

Speaker 3 [00:37:44] We're still here. So what's your honest no BS take on what you've learned at the conference in the data world in general?

Ryan [00:37:53] Oh, man. Put me on the spot. My hottest take, I would say my hottest take is that we still have a long way to go with AI folks. It's all we're talking about, and yet we still have a long way to go. Especially one thing I'm sick of is talking to people about text to sequel. I will tell you that because I think you and I Juan both know that this is not going to work, at least without some good metadata layer in the middle that is giving the AI meaningful information in order to generate query results.

Tim [00:38:30] Without the AI is going to start.

Ryan [00:38:32] Right. Exactly.

Tim [00:38:32] Without the context, there's no magic.

Ben [00:38:34] There is no magic.

Speaker 3 [00:38:36] I'll tell people very explicitly, " If you're going to do all this, chat with your data over your structured SQL data, you're not putting any context, you will fail."

Ben [00:38:45] There you go.

Speaker 3 [00:38:46] Don't be a failure.

Tim [00:38:50] I agree completely.

Speaker 3 [00:38:52] inaudible, what's your honest no BS take here?

Speaker 10 [00:38:54] My hot take is that I'm part of the grandfather club here at...

Tim [00:39:00] You're looking good.

Speaker 10 [00:39:00] I'm one of the ancients. These young kids are solving such phenomenally interesting problems.

Jon [00:39:11] It's okay, old guys.

Speaker 10 [00:39:11] But while I feel like we do have a long ways to go, it's really inspiring to see the iteration speed that this young generation is building. My ultimate goal in life is to upload myself, and so I feel a little bit better after seeing what's happening. I'm one step closer.

Tim [00:39:25] You're like, " We might get there in the next couple of decades here."

Speaker 3 [00:39:28] This is interesting because we had Ernie Ostic from IBM Manta, and he's saying the opposite. He's like, "We're still solving the same problems that was solving 20 years ago, 40 years ago."

Speaker 10 [00:39:40] We still get flat tires, we fight about where to go eat dinner, we can't hook up a laptop in a conference room, we get a common cold.

Tim [00:39:48] Printers get jammed.

Speaker 10 [00:39:49] Yeah. AI's amazing.

Speaker 3 [00:39:51] I love this. I love how we just go back here. That was a great one. That was a great one. We'll still be complaining about that. What should we watch on Netflix? And all that stuff, right?

Speaker 10 [00:40:02] Maybe we don't need to solve those problems, right? Those are the true human things.

Speaker 3 [00:40:05] This is the human thing. That's what keeps us human. All right, Ryan. Thanks for coming.

Ryan [00:40:09] Yeah. It was good seeing you.

Tim [00:40:10] Appreciate it.

Speaker 3 [00:40:11] Take care, man.

Tim [00:40:13] You live from inaudible.

Speaker 3 [00:40:13] Yeah. Anybody else? Ready? We've been talking for 40 minutes. People are inside, probably getting another beer.

Tim [00:40:21] Yeah. I guess we'll get another beer. John, did you want to get any hot take in, or are you good?

Jon [00:40:25] I'm good.

Tim [00:40:26] You're good? All right.

Jon [00:40:28] How about this for a hot take?

Tim [00:40:29] All right.

Jon [00:40:30] Hey, everybody. It is awesome to be able to work with these two guys. I hope everybody's enjoyed the hot takes from Data Council. It's been a long time since I've made an appearance on Catalogs and Cocktails, maybe four years even.

Tim [00:40:44] Yes. I think you're one of the true OG's.

Jon [00:40:48] My hot take is it has been incredibly fun to be here at Data Council, to be able to see old friends like HO, and Julian, and new friends like our friend Reza, who's just behind us right now, gave a hot take earlier. And it is always great to just see what people are actually working on. And I think that's the point of this event, is to see people working on real problems, figure out what those solutions look like, and then figure out how to get them to scale. I think that's where we're at with a lot of this generative AI talk is how do we scale these solutions? How do we make them trustworthy and how do we get these things into production? I saw so many great talks on that from the one that Tim and I were in on evaluating LLM's to things like you meet new friends here too. Folks from Pine Cone who are talking about how you make Reig applications better and different methods of fine- tuning. And I think that to get practical like that is really special. And to be in a space where you can do that, it's been great to do it in our hometown.

Speaker 3 [00:42:06] The another thing.

Tim [00:42:06] It's always fun to do this in Austin, in our backyard here. And yeah, I think that was well stated.

Jon [00:42:12] We got more.

Tim [00:42:13] Yeah, absolutely. We got a podcast going. We're live.

Speaker 3 [00:42:18] Paul, honest, no BS takes about data council and data world in general.

Paul [00:42:24] So I think one of the most interesting talk that I went to was this one on evals and large language model evals. Was it the Honey one?

Speaker 3 [00:42:34] Yeah, I think that's the fourth time it's already come up.

Paul [00:42:37] Really?

Tim [00:42:37] That may have been one of the breakouts of the conference.

Paul [00:42:40] So I think it was, and the reason for that is that the evals are very, very unsolved right now. And the reason that is the case is because most of the actual development happens and it's locally. It's stuff that you're doing in CI when you are running these processes to determine, " Hey, is this change good to deploy to production? Is this legit?" But most of the vendors trying to help with the valuation or up in the cloud. So what he went through is the actual pipeline that some really serious companies are doing to be able to do this, to be able to get the right systems in place to do eval through this sort of multi- stage process where you go through this sort of vibe test and then you go through a little bit more sophisticated testing and then that just gets more and more sophisticated going on. So it was really cool. It was one of the better things I've heard on LM evals, which I feel like is a very under talked about area because it's so important to building a non- deterministic app.

Tim [00:43:40] That was a good one. And he helped it be not cryptic, so it wasn't like some black magic. He was very incremental.

Speaker 3 [00:43:47] I'm pissed off I missed his talk.

Tim [00:43:48] You should have been there, man. I took a couple screenshots of some of his slides. Yeah, it was very good. Drew over at Honey Hive. So appreciate you if you're listening or this makes its way over to you. So yeah, appreciate it.

Speaker 3 [00:43:59] Any other hot takes for learnings?

Paul [00:44:01] That's my one right there.

Tim [00:44:02] Awesome. So cool. Thank you.

Speaker 3 [00:44:04] All right.

Tim [00:44:06] All right. Maybe final takeaways and hot takes?

Speaker 3 [00:44:09] This is the first time that we haven't done the takeaways and takeaways.

Tim [00:44:12] I know. No takeaways and takeaways. Well, my at least conclusion statement's going to be this is a fun conference because everyone's just trying to... It's smart people trying to figure stuff out, and sharing what they're learning, sharing the progress, and sharing some open questions and skepticism too, like modern data stack, data mesh, all these things. There's a lot of good stuff happening, AI, a lot of good stuff happening, and there's also a lot of question marks. So we've got work to do everybody in the data and AI space, and let's roll our sleeves up and figure it out.

Speaker 3 [00:44:42] That's it.

Tim [00:44:43] Cheers.

Speaker 3 [00:44:44] Thanks everybody for listening in, and we'll be back next week.

Tim [00:44:48] Yep. See you next week.

Speaker 3 [00:44:49] Bye. Everybody.

chat with archie icon