00:00:06 Tim Gasper
Hello, everyone. Welcome. It's time once again for Catalog and Cocktails presented to you by data. world. It's an honest no- BS, non- salesy conversation about enterprise data management with tasty beverages in hand. I'm Tim Gasper, longtime data nerd, product guy, customer guy at data. world, joined by cohost Juan Sequeda.
00:00:25 Juan Sequeda
Hey, Tim, how are you doing? I'm Juan Sequeda, principal scientist at data.world. As always it's a pleasure. It's Wednesday, middle of the week, end of the day, towards end of the day. Well, really end of the day as we'll see where our guest is coming from today. I'm super excited to have our guest, Simone Steele, who is an experienced CDAO. Until recently, CDAO of Nationwide Building Society. I met Simone over a year ago at the CDOIQ, I think in 2022. By just pure coincidence, we got on a bus, we were going, and I just sat next to her, and we just had a phenomenal conversation. We met up again last year. Last year... A couple of weeks ago, and she gave, I think, one of the best talks at CDOIQ because it was the most thought- provoking talk. Simone, how are you?
00:01:17 Simone Steel
I'm very well, Juan, and nice to meet you, Tim. I'm really delighted to be here, and I'm delighted that we made this work across the pond. I'm in France right now and it's a quarter... Well, it's 11 o'clock at night, so proper cocktails, post- dinner cocktails today.
00:01:40 Juan Sequeda
I'm super happy that...
00:01:41 Tim Gasper
So glad that you could join us.
00:01:42 Juan Sequeda
Yes, I'm super excited about that. So, hey, what are we drinking and what are we toasting for? Simone, kick us off.
00:01:48 Simone Steel
It's a post dinner podcast for me, and I've just had a lovely dinner here in the Dodo, in France. I went for an Irish coffee to keep... Wow. It is decaffeinated though, so it's kind of cheating. The important thing is the brandy. It's one of my favorite drinks and I'm glad to have an opportunity to drink with you guys.
00:02:10 Tim Gasper
That sounds great.
00:02:13 Simone Steel
I am toasting for... Actually, I'm celebrating the fact that finally data has come out of the nerdy world, even though we might all be a bit nerdy here in this podcast, and into the wild of people understanding now how AI is going to be trained by all this data. Now, all of a sudden we have a voice and a platform, and the interest of a much, much broader part of society.
00:02:41 Tim Gasper
I like that. Yes. I feel like data is finally coming into its own now, which is great. AI is creating a great platform for folks to get involved.
00:02:51 Juan Sequeda
That's a good inaudible for episode last week with Wendy Turner- Williams, which is about data as a first class citizen. So I definitely agree that that's the change that's happening right now. Hey, Tim, how about you? What are you drinking and what are you toasting for?
00:03:04 Tim Gasper
I'm actually enjoying some of the final days of summer here with my family. And so I wanted to have a summer drink while on my final summer vacation here. I'm drinking a summer spritz with gin orange and peach with a lemon. So something a little light and fruity while hanging out by the pool. Done with the pool now, time for podcasting.
00:03:27 Juan Sequeda
Well, I discovered I don't like ciders that much, but I was at the supermarket and they had all these very different amount of ciders. This one is called a Pineapple Paradise and it's vacation labation. I poured it in and it was really sweet. So then I mixed it with some sparkling water, but then it became too light. So then I put a little bit of rum in it. It actually turned out really nice. So I had to give it a name. But hey, Simone, this is how we run here. I came up with this minutes ago, but-
00:04:01 Tim Gasper
It's a Juan last- minute smash.
00:04:03 Juan Sequeda
There we go. That's what it is.
00:04:05 Tim Gasper
00:04:05 Simone Steel
Yeah, yeah. It's The Alchemist. You have to change the podcast. It'll be the Alchemist podcast from now on with all this drink mix.
00:04:12 Tim Gasper
Yeah, yeah. The data Alchemist.
00:04:14 Juan Sequeda
I'm going to cheers for vacations. I love... We need to take a time, we need to disconnect from our work and just spend time with family. I really appreciate that you're both on vacation and you continue to join us here. I'm actually on vacation on Monday, and I actually feel a little bit bad because next week we're not going to have a live podcast because I'm on vacation. I don't even know where I'm going to be with my internet connection. I'm going to be actually in Oaxaca, Mexico, so that will be a fun place. Anyway-
00:04:43 Simone Steel
00:04:44 Juan Sequeda
...Here's my weird drink, which is inaudible.
00:04:46 Tim Gasper
Cheers for vacation and the data.
00:04:47 Simone Steel
Cheers to vacations and data.
00:04:50 Juan Sequeda
All right, so we got our funny warmup question because the topic today is data sustainability. If you could give a funny TED Talk on sustainability living, what would the title be, or what would the quirky advice be that you would share?
00:05:08 Simone Steel
The title you guys are going to have to help me out with. But the theme has to be, " Why do people worry more in the realm of sustainability about single- use plastic cutlery and straws than they worry about streaming useless videos?" I was going to single out the funny cat videos, but... I don't want to make enemies here with your audience. They might love cat videos, but there are other silly videos out there. They are infinite. We are using this marvelous compute power, telecommunications infrastructure on earth, on space to stream that kind of nonsense. I keep thinking, " Come on, people, let's be more responsible with the use of our scarce resources." Anyway. So I have a particular thing with cat videos, but that's because I don't surf a lot of the internet. There might be other things I would talk about. But responsible surfing, but responsible scrolling, something like that. That would be the TED talk.
00:06:17 Tim Gasper
I like that.
00:06:19 Juan Sequeda
I like that. That's a good one. Hey, Tim, can you top that one off inaudible?
00:06:24 Tim Gasper
I don't know if I can top that, but now I'm starting to think about not just cat videos, but some of the... Sometimes you're watching Netflix or Hulu or something like that, and you're watching a show and you're like, " Why are we watching this? This isn't even that good."
00:06:35 Simone Steel
It's not even good. Yeah. That makes me think, " How does it even get made?" Because it's not just the resources for the watching. It's like a lot of people's... Some of them have a lot of talent. People invest a lot of their time in it. So you think, " These are clever people. They should be trying to solve world hunger and peace everywhere, and not producing these videos," I keep thinking. But anyway, life is-
00:07:05 Tim Gasper
If anybody from Netflix is watching, we want to chat to one of your data scientists, because I think that you're looking at our behavior here and you're like, " Oh, these dummies, they're going to watch this."
00:07:16 Simone Steel
They're going to fall for it.
00:07:17 Tim Gasper
They waste all their-
00:07:18 Simone Steel
00:07:19 Juan Sequeda
All right, well this is a good way to just the segue into our discussion. Simone, honest, " Nope. Yes." What is the relationship here between the growth of data and the environmental and business sustainability goals?
00:07:32 Simone Steel
Yeah, well I think we're twofold. Yeah. One thing that we debate a lot in professional circles, professional circles in cybersecurity, engineering of all new marvelous technology solutions to our business problems. And data for training models and for producing value for businesses. There is a component of our technology ecosystem that can help shift the burden from the physical world of transportation of goods, improvement on logistics that I think is important for sustainability of human presence on the planet. But then there's the flip side that we joked about the cat videos, but there is the flip side, which is technology being used to manipulate human behavior that adds very little value to... Well, a lot of value to very few pockets, but actually consumes a substantial amount of not just electricity but natural resources to keep expanding data centers. And keep launching a few more thousand satellites and other things that are justified by our demand for digital and data. I think this balance of, " Is data enabling a more sustainable life on the planet, a clarity of climate modeling, environmental destruction and all that comes from data?" But then we apply for a tiny segment of our economy, which is economic gain for marketing sales, and sometimes just profit maximization via analysis of data for no societal benefit. When I talk about sustainability clearly, a lot of people who listen to my talk in the CDOIQ immediately thought about natural resources. But I think there's sustainability of the practice as professionals and as consumers that includes the social aspect. Are we training professionals appropriately? Do we have enough of those professionals? Is it sustainable to manage businesses with such legacy we have with potential flaws and debt building up? Is it sustainable to be just consuming more? A lot of my data, what I worry about data and sustainability is the fact that we become far more fine- tuned to certain segments of the population that can consume more. And we don't apply the principles of diversity and inclusion when it comes to our data modeling practices. We repeat history because we are looking at data historically. Sustainability for me is, " Are we bringing everybody along with this great new world of technology enablement or not?" I don't think it's sustainable. I think we are creating very powerful elites and very many disconnected groups because data is really clear saying, " Don't worry about these guys. Just create solutions for the ones that can buy a phone every two years." That is really what I think is the thread, the links, both the positive and the harm that data can bring to sustainability conversations.
00:11:40 Juan Sequeda
I want to go back and just remind the audience here that you gave this talk at the CDOIQ, and I was very excited when we attended your talk. That's why we wanted to be on the podcast, so we can basically review your talk again. In your talk you actually drew something on the... Well, there's no whiteboard inaudible. Which I have a picture here which I'm going to show it to you so people who are actually seeing this can see it. But then I'd love for you to explain what we're seeing here because it was very insightful what you have. Here you're showing basically the x and the y- axis. The X is about the time and the Y is the benefits of-
00:12:22 Simone Steel
I will narrate, yes. Let me narrate the video. Yeah.
00:12:26 Juan Sequeda
Yes, please. So just... Right.
00:12:30 Simone Steel
A little backstory for the ones that were not there. As you get jet- lagged in these conferences sometimes I woke up literally at 3: 00 in the morning thinking, " Huh, how do I explain sustainability and data in one simple image?" I had not prepared that for my talk because it was a 3: 00 in the morning jet lag moment. I realized that over time in the x- axis that you showed, there are two things happening. One is the fantastic development of technology that I mentioned as one of the relationships between data and progress and sustainability of our activities. It goes really in a steep exponential curve upwards. Every little discovery or commercialization opportunity, say cloud computing, is one. Large language models being available in open source is another one. These are inflection points where we exponentially improve and increase the potential benefits of technology. And then I drew the second curve, which is lower, still improving, but a much lower rate below. Which is our capacity as humans to make that potential benefit, real benefit. We trail a little bit. We have to learn the technology, we need to create transition programs. We have to adopt new ways of working, thinking, modeling and coding and all sorts. So we start from a much lower gradient with the inertia of the past. And then we have maybe a bit of a disaster from the new entrant in the market and we get slammed with more regulation. Say, " Ah, now you have these previous concerns." Maybe now you have this architectural concerns, this cybersecurity concerns. Instead of being able to accelerate at the same rate as the potential benefit, we start to flatten out a bit. So the bottom curve that Juan was just showing actually shows us that yes, we can make incremental progress, and make the benefit real. But it's not at the same pace as new technology potential benefit releases. My provocation to the audience was, " How do I move from the bottom curve? How do I piggyback on the top curve? What are the things that stop us operating at the edge of the technology development safely and sustainably?" Because that is the thing in an IT function or a data function. We are always challenged by finance partners that we look like a bottomless pit of money. You keep throwing a lot of investment in there and it gets sucked up by all the debt from the past. I'm paying interest rate all the time and more and more as I go along with my own technology. Something is not being designed correctly in our corporate behavior to understand what is needed to be on that curve. Not at the sharp edge of the curve, but on that same trajectory of the benefits that come with new data technologies available to us. So sustainability is, " Can I keep up?" If I had to replace the word like, " Can we keep up? Can we keep up with resources? Can we keep up with knowledge? Can we keep up with the environmental destruction around us?" And so on and so forth. But it was, I think, I needed to find a mechanism to bring the audience to this place of sustainability is not just about net- zero. Sustainability is about next generation of leaders. They'll be here in a blink of an eye. How are they leading this discipline of digital and data going forwards? And that's where it all started in that day at the CDOIQ.
00:17:00 Tim Gasper
No, this is really cool.
00:17:01 Simone Steel
Did I do justice?
00:17:04 Tim Gasper
Yeah, this is really cool. Juan was able to be there and unfortunately I wasn't. So it's very interesting to hear you walk through this model. Actually my background is I'm an economy major, economics. And so whenever I see charts like that, it always reminds me of economics, and the different models around looking at whether it's GDP or different macroeconomic or microeconomic scenarios. I see you describing a model here that helps us think about, " Hey, there's this..." As you're saying, "... potential benefit." As these major enhancements are happening around technology, we could be doing so much with that. It sounds like you're mentioning that in reality we tap into quite a bit less than that, and especially as new progress happens that gulf becomes wider. You mentioned regulation and things like that as being one aspect to this, which can either make that gap widen. Or maybe if it's the right regulation, it can actually make that gap become smaller. I imagine also sustainability when you think about resources, whether social resources or physical resources. Also, when that's not managed properly, that also is a factor that's making that gulf wider?
00:18:39 Simone Steel
Yeah. I assume that we are not applying our data knowledge to the market of data science. As an economist the faster the money moves around, the more it appears to be... It's this inflationary aspect of the velocity of money moving around. It's the same with data scientists. We, industry, keep stealing data scientists from one another, because we have to pay a premium and get them and then somebody else... Trump pays a bit more and they get them. This is part of the sustainability discussion. Say, " Are we bringing people up sufficiently for the labor market of the future? Or are we just simply inflating the value of very, very few people and just creating this merry- go- round that is good for the individual?" I consider myself part of that because my background is threefold. It's computing, which I studied first as an apprentice in a technical college. Then I majored in economics, specifically econometrics and statistics. And then I did a data science course much later in life just to see what the fuss is about. As it turns out is econometrics. And predictive models have not changed very much. But I keep thinking, " Who is explaining this path?" Perhaps the visual representation comes from this background of demonstrating economic models. They're quite sophisticated in terms of demand, supply, and elasticity of prices and interest rates and stuff. How do we represent this to an audience that is not a computing audience? There are executives making decisions about big picture decisions in their companies. I think the picture, if I had perfected it and put on a PowerPoint, it would be much more useful. But you will be able to see the video I'm sure once the CDOIQ production opens it up on their YouTube channel.
00:21:02 Tim Gasper
Yeah. That'd be awesome.
00:21:03 Simone Steel
Hopefully will make a little bit more sense for your audience as well.
00:21:06 Tim Gasper
No, I'm looking forward to them being able to check that out when that's available. I couldn't agree more on how do we make this accessible to understand? Because I think of the term, " Externalities," and the externalities of these technology, it seems like a lot of people don't fully recognize the externalities around things like data and the progress around technology.
00:21:32 Simone Steel
Absolutely. Yeah. I discovered very recently as I was writing the book that even though I... There is a chapter dedicated to externality. You'll be happy to know, Tim. But I might have to change the title because it alienates people. Technical terms, I realized that in technology and data in any profession, economics, in medicine, you throw in a new concept and then it's a strange name, people disconnect from what you say to learn what the term actually means. Somebody came to me and say, " Why don't you just say,'Unexpected effects,' or, 'Side effects, or... ' " Well, technically speaking it's not the something thing.
00:22:18 Tim Gasper
Pros and cons or something like that, yeah.
00:22:20 Simone Steel
Yeah. Technically speaking is not the same thing. It's something that is not considered in the model that could be beneficial or harmful. But I think with data, the simpler the language, the more people will want to engage. I think the talk as well, so just going back onto this talk, it'll be very mysterious now for your audience, but it is about making it accessible. There is no good to anyone creating this veil of complexity, and just talk amongst ourselves, data professionals. I think very common habits of archiving, destroying your documents on paper, people used to do that a lot. I keep my university certificate, but I don't need to keep all the receipts after I've done my tax return maybe for seven years and then I can destroy them. People were used to doing that, and that wasn't complicated. But now we talk about data retention, it gets elevated to a really complex level that it doesn't need to be. I think sustainability is about how many more people can I bring along with me rather than just say, " You are not the expert. Just talk to me about externalities."
00:23:42 Tim Gasper
00:23:47 Juan Sequeda
I would like to get into some concrete examples. You were getting into one right now, for example, like retention. We're like, " Oh, so here's something that we do. Let's call it..." In the real physical world, we'll keep track of physical documents and stuff, but now we also have this notion of data retention, but we're like, " Retaining everything. Let's log everything. Let's keep track of everything that we can. Just because we can does it mean that should we?" That for me seems like an aspect of data sustainability. I'd love for you to go through some examples across different spectrums here about what does sustainability look like? And what are the pros and cons about doing one thing or another?
00:24:30 Simone Steel
I think the data retention is probably one of the most important. I would say hygiene habits that every citizen and every professional needs to understand and practice. Not just important documents, or perhaps I don't even know what threshold to draw on metadata and log files. Let's take for example, the unstructured data that we all generate in big corporations. I think that is for me a really good example, because it's close to my heart because I saw a person at work struggling to explain to their boss, " Why is it that we keep buying more storage and it never seems to be enough?" This person turned up in a data governance forum and I said, " I'm really sorry. I can give you the policy, but this is a governance forum." That person was so desperate because there was no forum that cared about her problem. It seems to be like she had this monkey on her back. It feel to be like, " I need to help that lady." That's how the whole sustainability thing started for me. Because I couldn't just say, " Here's the policy. Next. Who's next with another problem." Why is it the most important? Because in order to do it well, in order to know what to keep, what to throw away, what has potential value and what is just rubbish, you need good hygiene of classification, labeling. You need to know what it is. I hope that one day people will turn AI into the mess that we created and AI will help us to figure out, " Oh, this thing, you didn't know what the meaning was. Maybe I can give you some context," or whatever. But the matter of fact is we hoard data because we don't know what it is. Cataloging, classifying, et cetera is maybe not so glamorous, but there are lots of librarians that I'm sure lost their jobs when we thought we didn't need libraries anymore because we could Google stuff. So bring them on. Can I get librarians that lost their jobs from all the local libraries that are closing down, and they can tell us good information management techniques that we can apply to the digital world? The diversity here of experiences in different fields... Because data is not actually a field of its own. Unstructured information like books may have a technique. Scientific data for genetic research may need a completely different technique. So, the fact that both need data is neither inaudible or there. But the example that... I worked with this lady who brought the problem and we started to figure out, " First of all, can we stop creating more rubbish? Even if we don't know exactly the mountain that we are sitting on, what are the things that we do know that people are doing that they could, with the change in habit, create a great benefit for your data growth to not be so exponential in the future?" We came up with a blog that was not for the tech people, for the entire organization to just start to educate people. " Did you know that when you send a four megabyte PowerPoint with two pages of text, you are most likely carrying a huge template that you cannot see?" " Did you know that if you email 50 people in an executive and minus one position, they have to keep their emails for 20 years for regulatory financial regulation in the UK? Did you know..." Because when people know what the effect is, the cumulative effect is with images of how much it grows, how much it multiplies itself, how much... People don't know that we have to have a seven- day rolling backup of all our mail servers. Four megabytes times, seven times 50 people that received it, times 50 people that sent, " Thank you," and forgot to not send the attachment back, yada, yada, yada. You can see that the actual text of that, the value of that PowerPoint, two pages of text might be 100 kilobytes. But what they've generated in terms of load was over six and a half gigabytes. People still look at me like I have two heads. I say, " Let's draw this this kilobyte- megabyte- gigabyte thing." When I showed them the ratio between a 100 kilobytes and six gigabytes being my little drink and the height of the Eiffel Tower for example. Then they say, " Oh my god, I didn't realize that my 100 kilobytes is probably, I don't know, 10 centimeters. And the Eiffel Tower is what, 300 meters? So it'll be like three and a bit gigabytes." Then they feel included and they change the habit, because they know they are creating a problem for a friend. They're creating a problem for that department over there. That for me was the aha moment on data retention. Well first stop creating it, and then we can look at how do we unpack, how do we compress useless conversations? How do we encourage people not to say, " Thank you" to a hundred people in that project via email. Go on the chat channel, go into the coffee place or whatever it is you do. But just don't do silly things that just create the appearance of free data, but actually it's somebody else paying the price. It's the externality from their point of view. Somebody else who has the problem, has the monkey on their back. This, I think is the most useful example that was immediately recognized by people whose job titles did not have data in it. Because quite frankly, we're all generating... This podcast is generating data. So we need to be really careful. Like, " Is this value?" " Yes, let's go for it." " Is this value?" " No. Just don't do it." I think that is a habit we should teach our kids before it's too late.
00:31:46 Juan Sequeda
There is so much deep thought in what you just said. I do want to take this as a quick joke on the side. Actually, one of the coolest things that came out, funny things is like, " Don't reply,'Thank you' by email to 100 people. Stop doing that..."
00:32:03 Simone Steel
I don't even know if they're thankful. Oh, somebody picked on me. Juan, seriously. Somebody said to me, " Oh, but that's just bad manners, and one little thank you..." They think they're sending nine characters and I'm just trying to explain to them, " This email thread has been running for decades. This project is like... If you scroll down, you lose the will to live. This thing in itself is already 500 megabytes."
00:32:31 Juan Sequeda
I'm curious. Are there any studies on the amount of space in compute and money for superfluous data retentions? Because it would be fascinating to know. It's like, " Well, this organization, out of the 100% that's being stored and all this money and all this compute, actually we only needed to do, I don't know, 20% whatever." That 80% of compute of energy, there's energy spent, money spending on storage and all that stuff, and people's time to go manage this stuff because you got to keep backups and... That's useless.
00:33:09 Simone Steel
There are many studies, Juan, that... Well, there is one very deep detailed study of a team here in the UK called the DevoTeam, but they are dedicated to looking at data created by software developers. They are the guys who go deep into the history. Every company is very different, but they go deep in history, say, " Oh my God, this source code think aren't even built anymore. What are you doing? It's being written 20 years ago to an operating system that we don't even have it." They do this kind of archeology. I think that's how they position it. There are studies and the people already becoming quite tuned to the problem. But also there are business studies, I think IBM has produced one, and there are many academic ones. The more studies you look at, the more the range of usable to unusable... or unused information becomes clear. Only about 10% of information generated is ever accessed again after a week. About 20% is accessed, I think, in the first month. Most of information generated is never accessed at all. This is kind of rule of thumb, but it looks a bit like the 80-20 thing, but it could be 10- 90 or 90- 10. But it is that kind of level of magnitude. It's not 50-50, is what I'm trying to say. There's more rubbish than useful information everywhere, everywhere. But I'm sure there will be companies that will say, " Oh, I'm really a good use case here." My company's very lean and we don't carry any of that baggage. And I'm still waiting to see who they are. Because everybody keeps looking at these numbers and say, " Oh my god, I think my number is 95% unused." People actually have a bit of a panic moment and say, " Actually, I don't see my business being any different. I recognize the patent in my business." But we dedicate very little time to looking at that kind of internal problem because it's so invisible. The only person who sees it is the CFO when he gets the bill for your cloud storage. If I might say just one more thing, Juan, very quickly, I promise. There is an economic incentive building up for us not to throw away data, which is cheap cloud storage, deep storage, spot instances and things that say, " Oh, don't worry your head about this, your pretty head. Just buy this service. The problem will be cheaper for you right now." But clearly my business model as a storage provider, banks on the fact that there'll be more demand. Counts on the fact that there'll be more demand.
00:36:19 Juan Sequeda
That's where the incentives are aligned to like, " Oh, this is cheap." You hear this a... "That storage is cheap." So I'm like... I don't even consider thinking about having to go deal with that. We've talked a lot about data retention as a sustainability opportunity. You brought up something earlier before on, " We repeat so much stuff." There's so much wheels being reinvented. And you brought up something, if I understood correctly, around even data modeling and stuff. I'm curious to know what other sustainability opportunities, and is there anything here related about how we should be doing more... Just doing data modeling so we don't reinvent the wheel?
00:37:00 Simone Steel
Yeah, yeah. I think the fact that of course our systems copy our human structures within the organization clearly have influence system design... Of course the systems are designed to cope with the dynamics of people... has encouraged all of us to duplicate everything that we cannot reuse. I'm a big fan of... There is a data standard for scientific information exchange called the FAIR principles. I think is well known in this kind of circles, they're findable, accessible, interoperable and reusable attributes of data. It is a shame that this is not the law. If you already have it, why... But of course people think that if I send you a file, we use the wrong verb. I'm not sending you a file, I'm giving you a copy of my file. This day and age, we might send each other links and we might look at the same book, basically, we look at the same reference. But most lay people, they send stuff as if we put something in the post. With that thought... Inside the company the same thing. We're sending the sales today to the accounting department. We are not sending anything. We're just copying the whole thing. The implementation of the FAIR principles should be just like when surgeons go onto the operating room. They wash their hands, they put their masks, they disinfect the equipment. They have those standards, they can't deviate from that. The patient would die. Why are we professionals being... It's a psychological thing. I really enjoyed one of your previous guests, Tom Redmond's talk about people and data. Because people are not doing this because they want to be evil. They're doing this because they don't know any better. I think we know better. We professionals know better. What are the things that are triggering, that are making us do the wrong thing? " It's out of my professional character to do, but they pay me. They told me to do it and I'll just do it." I think we need to embrace a few more sociologists, psychologists, anthropologists to explain what the hell is going on inside these companies that push people to do the wrong thing. Anyway, it's money. We know the answer is. We know the answer-
00:39:55 Juan Sequeda
All the incentives that make it hard. I think what's interesting as we talk through these different examples is just how much it's not top of mind for data leaders to think and talk about sustainability. I think that there's a lot of attention paid to innovation. There's a lot of attention paid even to things like trust and regulation and things like that, but not nearly as much on sustainability, which I find very interesting. Because as AI becomes more of a thing, as data science continues to mature, we're talking a lot more about bias for example. Bias in data and in models a lot more than we're talking about sustainability. I know those two things have a relationship with one another. I was just curious, why do you think that is, and how do we start to make sustainability more a part of the data leadership conversation?
00:41:01 Simone Steel
I'm going to have to quote... I don't know who said that, but it's a famous physicist, and you guys probably know probably Max Planck that said, " Physics progresses one funeral at a time." There are so many strong leaders in their field that new practices struggle to break through. I'm considering myself here one of the old folks. I've worked in data for 36 years now, and I would love to retire at some point and do something a bit more creative in... I keep saying 10 years time, I've been saying 10 years time for a long time. " In 10 years time..." But to answer your question more directly, Tim, I think we do these things because our personal lifespan and our professional lifespan is a tiny little episode in a company's system lifespan. Our longevity within this problem is not compatible. So, we make decisions as leaders that has to balance my professional survival, my budget for this year, my competition and the pressures of AI and whatnot. It is a real tight rope work here. I think it's nice to have a philosophical, well, coffee- infused discussion at this time of night for me. But I struggle and I want to share this struggle with as many people who want to listen, to say, " How do we put ourselves in a sustainable curve of the benefits?" Because it is a different leadership style. But the leaders are being incentivized to do what leaders 50 years ago what we voted to do. Action meets the budget, increase the sales, market penetration, cross- sale... Whatever there is. I can't even as a data leader say that I have a data strategy. I have to have a data response to my business strategy. This is how second class we are.
00:43:18 Juan Sequeda
00:43:19 Simone Steel
Well, not everywhere.
00:43:20 Juan Sequeda
...This is a fascinating conversation we're heading, and I wish we can keep another hour here, but you need to go to bed after inaudible. This is the time when we have this philosophical discussion. First of all, I always say, " Science is a social process," and it is all about being able to convince our peers. We talk about peer reviewed is because I have some theory, and I do these experiments to provide evidence to support this. I have this belief and I want to convince my peers and I present this as I write it. I write papers, I make presentations, I go to conferences, I talk, we now do podcasts, we do blogs, all these things to be able to convince our peers around this stuff. From there it takes time. The discussion I always come up with, is this balance between efficiency and resilience? We're so focused on efficient because that's what we're incentivized for. You're so spot on, Simone, that has been what we've been trying to go do for so many decades and we're finally doing that. I think now we're realizing we need to think about also being resilient and thinking about the future, finding this balance. But our incentives are not there yet. Hopefully, these discussions are happening such that we're inspiring the next leaders. In 20, 30 years time, they're going to find this balance and they're going to go back and look at history like, " Oh, 30, 40 years ago all they did was care about what's going on in the next quarter. And that was it. Wall Street was looking..." And then we're going to shift, it's like. " How stupid were they in the past?" Because they thought... I don't know, maybe that's the discussion. Hopefully that's a discussion. Going back to something you were saying before on the FAIR principles. The FAIR principles comes from my academic home base, the semantic web, knowledge graph, community and pharma space. I've been around all... A lot of my colleagues who actually wrote the FAIR principles. I was around this time when they were doing this. It's something that I see and I'm like, " This makes so much sense, but why aren't we doing this today?" Because it just takes time for some people to pass away and that's the Planck principle around that. Hopefully, we're inspiring the next generation of leaders that we need to start thinking more about-
00:45:43 Simone Steel
It's so true. I think it's the next generation, but also for us to understand that continuity is not bad. I grew up in the middle of two generations. My dad thought that me changing jobs all the time was clearly a bad thing. And then the millennials thinking that, " Why would I want to stay for one than two years here? Because I want to learn, I want to grow." It's egotistic. " I want to." My dad is a different generation. It's like my role in this community matters more than what I'm going to achieve as an egocentric person. So I can see those two dynamics clashing. I think this succession planning shouldn't be a dirty word. This volatility is damaging our profession. People coming and going, and knowledge being dissipated in ways that is not increasing the collective. I feel the tools I'm asking, the problems, is like somebody giving a Ferrari to my 18 year old that just passed his theory test on the driving test inaudible-
00:46:57 Juan Sequeda
I'm the first to be the grumpy old man and being pessimistic and look what we're doing. But then I stand back and acknowledge that the field of what we call computer science has been around for 60 years. Since Alan Turney, so it's like in the 40s. Then the field of math goes on, but this is a very young field and it's a very immature field-
00:47:25 Simone Steel
00:47:26 Juan Sequeda
... to compare it tomath, to physics, to... All these things. So, we should give ourselves credit too, that we are doing so much advancement in small amount of time that maybe we don't have to wait a generation. But I'm going to say it's another generation. Hopefully I get to see it.
00:47:43 Simone Steel
Yeah. We are the pioneers. Yeah, we are the pioneers. Yeah.
00:47:49 Juan Sequeda
This is a fantastic discussion, but we got to start winding down going into our next segment. Next segment is, we're going to go our AI minute, one minute to rant about AI, anything you want. I'm going to time you. Ready, set, go.
00:48:04 Simone Steel
Do we really know why we need it? I think we are losing... It's just a neighbor jealousy thing that we are going through right now. People, just calm down. Not everything needs to be action. A lot of what we do needs to be thinking. Now every company has an AI strategy, and if you don't, you'll be shamed in the cover of The Economist. " These people missed the boat, and they'll be dead next year." Just why are we rushing? Obviously the answer is also money. Tim, I know that. I know the answer is money. But guys, we could do a lot of damage. I'd rather stick to my natural stupidity than having more artificial intelligence at this point in time, because we are destroying a lot of things we touch. It's just the why is missing for me, that's my whinge. I don't get the why. There's a lot of sales brochure, but show me you believe in it.
00:49:07 Tim Gasper
I think that's fair and that's a very sobering perspective because it is not a naysayer perspective. You're that it's not going to be valuable. It's saying, " We really got to understand why we're applying it and how. And do some thinking about it."
00:49:21 Juan Sequeda
Be critical about it.
00:49:23 Simone Steel
Just a little bit. Stop for five minutes.
00:49:28 Juan Sequeda
All right. Let's go to our lightning round, which is presented by data. world. I'll go to the first question here. Is there a close relationship between good data governance and data sustainability?
00:49:39 Simone Steel
Absolutely. Absolutely. I think that is... They should be elevated to the guardians of good practice and the guardians of our future in data. I think that's probably an undervalued aspect of data governance that if you're attached to it, maybe people won't be so afraid of taking that as a profession. The new librarians. Go.
00:50:05 Tim Gasper
Yeah, the new librarians.
00:50:09 Juan Sequeda
We've talked before about that. We need to rebrand data governance to data empowerment or whatever, but also data sustainability be something in there. I like this. inaudible.
00:50:18 Tim Gasper
Yeah. I like that.
00:50:18 Simone Steel
Oh, yeah. Good old information management. Yeah. Okay, go.
00:50:23 Tim Gasper
There you go. All right, second question. I'm trying to think about short- term incentives around how do we encourage more sustainability. There's ESG initiatives and budgets. For those that aren't familiar with that, environmental, social and governance, things like green initiatives. Also, data. world is a big corporation and so there's different corporate movements that try to align more. Do you feel that these short- term incentives make a difference?
00:50:58 Simone Steele
I think they need to be fewer, and they need to be sharper and then they will make a difference. Using the example as you did, because I was very encouraged by ESG until I understood that my sustainability department really was very preoccupied with what's going to go on the website. Like, " Are we publishing our net- zero commitment? Are we able to evidence how much water we are recycling?" Less so about the sustainability of our human practices. I think we are trying to bite too much with the ESG 12 principles of sustainable development and that gets people confused. I think every company, every responsible leader needs to pick one or two things, and then I think they'll be powerful.
00:51:59 Tim Gasper
I love that. That's good, honest no- BS right there.
00:52:03 Juan Sequeda
All right, next question. In the next five years, do you feel like AI will make a net positive impact on sustainability like with innovation or a negative impact, more resource consumptions, physical, social, natural?
00:52:20 Simone Steele
I think we will only hear about the positive impact. I think unless you dedicate... Back to one of your questions, how much of the data we produce, we don't use. Of course the academics that inaudible as much as I am, they're preoccupied with that. At the moment I get the results, I might change my behavior. But with AI, people are going to act and they will publicize so many successes. They will publicize a lot. I hope they happen actually in areas that do matter, where data crunching combinatory problems and things, hard computational problems coupled with quantum computing because it's just around the corner. It might be a different podcast one day. But quantum computing is around the corner, and it likewise needs to be applied to problems worth solving. And so I do hope it'll be positive, but I'm absolutely certain we will only hear about the positive. It'll be hard to dig the real costs.
00:53:26 Tim Gasper
That is a very wise statement.
00:53:29 Simone Steele
00:53:29 Tim Gasper
Yeah. People ask questions when we're training ChatGPT and things like that, " How much compute is that really consuming?" and stuff like that. We all assume it's probably quite large, but we don't really know.
00:53:42 Simone Steele
Well in the UK, when I moved from Brazil to the UK, I was astonished and shocked by the fact that my house did not have a water meter. How much water you consume in your house so you pay the bill. It was all done in an estimate thing because it was an old Victorian house. It still is. I still don't have a meter, but I try to consume less water. I'm going to pay exactly the same if I have a swimming pool or not. I don't, by the way, but I was just saying. Data is the same. Compute power and data need to have a meter. I need to get a bill. Like, " How much have I used to train my ChatGPT? Oh, dear. Maybe I should use less next time. Maybe I don't actually need it after all because this bill here is too much for me." It's exactly like, " Give me a water supply with no accountability for how much I use, and it's likely that I'm going to abuse it."
00:54:45 Juan Sequeda
I think that-
00:54:46 Simone Steele
I try very hard not to. No.
00:54:49 Tim Gasper
Yeah, this is good. Yeah. It goes back to your comment about AI. You can emphasize all the benefits, but if you never connect it to the costs. You have to measure.
00:54:58 Simone Steele
There's no meter. Yeah, there is no meter, there is no counter, there is nothing. Yeah.
00:55:06 Tim Gasper
All right, I want to keep on asking question about that-
00:55:08 Simone Steele
Juan, you're going to say something? I just it cut you.
00:55:10 Juan Sequeda
You got one final question there, Tim?
00:55:13 Tim Gasper
Last lightning round question. All right, is the best thing that we can do, everybody who's listening to this podcast right now, is to become more educated in sustainability around technology, around data, and to spread that education? Is that the best thing folks can kind of focus on?
00:55:30 Simone Steele
Yes, yes, yes, yes. I cannot emphasize how important that is. Not just us working in this field. Your mom, your dad, your neighbor, your children, your girlfriend, the postman. Education. This is not a mysterious thing. We like to use very complicated words. No, we should remove that veil of complication and get people to say... You know how plastics are made? You don't know exactly, but you have a mental model. People understand now the life cycle and how ended up with microplastics in the ocean and the water that I'm drinking. People now understand. They need to do the same for digital services and data. They need to start now being... Well first of all, we don't have time to reeducate everybody, put them back in school and reeducate and create a new curriculum. There's no time for that. We need inquisitive, curious people. They understand. They always, always behave like a five- year- old asking, " Why is this?" " How does this work?" " Why do I need this?" As adults, we need to be more like five- year- olds. Education is everything because of the accelerated curve that we were talking about. There is no time. There is nobody at the head of the curve telling us, pulling us up with them. There's nobody doing that for us. So it's up to us to read, to listen. Honestly, if one thing that is available to us is now information. It's sometimes too much. It's hard to tell the truth from the untruths, but we could really educate ourselves instead of looking at more cat videos.
00:57:20 Tim Gasper
I think that's a good reminder to everyone here. In between each cat video, please learn something new.
00:57:28 Simone Steele
Yes, go listen to a podcast or Ted talk.
00:57:34 Juan Sequeda
All right, well Tim, takeaway time. There's so much here, take us inaudible.
00:57:38 Tim Gasper
I know and I'll do my best to condense it, which is... I really think that Simone, you hit a lot of really wonderful points here. I think you really educated me and our listeners around how to think about sustainability in a broader way, and how technology and data can have an impact here, both positive and negative. I really think that one of your biggest points that you started with here is that data can be a double- edged sword when it comes to sustainability. There is a flip side to using technology that isn't just the physical resources, but also the social impact as well and the behavioral impact as well, and you have to think about those issues too. We can use data to help, but there are going to be a lot of these negative side effects. Sustainability isn't just about natural resources. You really have to think about that social aspect and that's very important too. And are bringing everyone on this journey of enablement or are we leaving people behind? That's not just the people that have the data or don't have the data, but it's also what decisions that data is leading us to where it's benefiting some people. It's also not benefiting some people. That can result ultimately in a lack of sustainability, and a lack of the benefit that could happen. You tied it to, really, a model where on one axis, the x- axis you have time, and on y- axis you have progress or benefit. There's really two curves going on. One of them is the potential benefit. As all these new technologies... You mentioned cloud computing, LLMs, the potential benefit is quite immense. Every time we hit those inflection points it pops up. But the second curve, which is the real benefit, is actually, there's a gap forming between the actual benefit and the perceived benefit. Things like regulations, things like sustainability, all these things if done right, should be bringing those curves closer together. I think that's a really powerful mental model to think about this. Juan, what about you? What are your takeaways, Juan?
01:00:01 Juan Sequeda
I loved how we actually got very specific on examples of sustainability and we talked a lot about data retention. Why do we keep buying and paying for more storage? People are like, " Why are we spending so much money on this stuff" You have to really understand the needs behind it. One of the things is that we may not even have good data hygiene habits and classifications around this, so we know what needs to be stored or not. That's why we hoard so much data because we don't know what we have. We don't have good data governance that can enable us to be able to have better sustainability around our data. We should educate our community, our colleagues about these little things. For example, did you know that when you send a PowerPoint that has four megabytes in it and you're sending this template, and the most important information is really this text, which is really a 100 kilobytes, that's all really that you needed to go share. If you're emailing X amount of executives, you need to keep all those emails for, I don't know how many years around that. Explain these types of things to your community and make them included so they understand the impact around that. One important thing is, " Do not reply,'Thank you' to an email, Fred. That's it. Stop doing that." 10% of saved data is looked at only after a week and that's it. We save all this data, people aren't using it at all. Now there's always these incentives to store because" Hey..." quote, unquote, "... Storage is cheap." But just because we can, does that mean that we should? Another aspect that we talked about sustainability opportunities is the FAIR principles. Findable, accessible, interoperable, reusable. I'm not just sending you a file, I'm actually sending you a copy of the file. It would be fantastic if we start getting into this new reality where we're actually sending you pointers of this thing, so you can reuse it. That's why modeling, and making, and knowledge is so critical so we don't reinvent the wheel over and over again. Talking about sustainability and leadership. But then we got into this philosophical discussion and following that Planck's principles. The science progresses one funeral at a time. And so I think we have to acknowledge that we are right now doing the things that people were trying to go do 20, 30, 40 years ago. Hopefully this is the opportunity we're having to inspire the next generation of leaders to make sure that in 20, 30 years we are having this data sustainability as first class citizen around that. I think a takeaway I have here personally is that anytime you're doing... Whenever you're doing something with data, replace data with water. Ask yourself, " Is that how you would treat water?" Store more of it. Just throw it away. That's something I'm going to start thinking about that. Simone, how did we do? Anything we missed?
01:02:54 Simone Steele
No. You guys are great at summarizing. I'm so impressed. Yeah. It's a shame that I spoke for an hour and you could all synthesize it in like a minute each.
01:03:01 Juan Sequeda
No, but there's-
01:03:02 Simone Steele
I'm like, " What the hell did I talk about?"
01:03:06 Juan Sequeda
No. We always publish our takeaway episodes first. I want people to realize if you're listening to takeaway episode, you need to listen to this entire episode right now. So with that, before we wrap up, just three final quick questions. What's your advice? Who should we invite next? And what resources do you follow?
01:03:25 Simone Steele
Well, my advice is never... How to say this in English... abdicate responsibility. Is that a word? Abdicate?
01:03:34 Tim Gasper
01:03:35 Simone Steele
Like kind of... Don't think it's somebody else's problem ever when you're talking about data. Just stop and ask why. That's my advice for everyone. My recommendation would be, and I can help connect you with him, if you would like this sustainability thread. Gerry McGovern, who is the author of World Wide Waste. He's a digital designer, has been for as long as us possibly in the field. And has a wealth of end- to- end design and development lifecycle that now I think marries up with data quite nicely. It could be an interesting conversation. The third, where do I go for knowledge? I do read The Economist every week. And to try not to just get into tech literature, look a bit more broadly other things as well. But at the end there is always a recommendation of a book. At the end of each weekly edition, there's always a recommendation of a book. I really look forward to seeing what they are reading, and whether that can broaden my horizons as a data professional and as a human being. That is my go- to source. I don't watch the news, I don't look at Tweets, I don't look at breaking news. Every Saturday I'm the person finding the news late. I just say, " Oh my God, this happened this week." It's like I'm leaving in the 1940s. It's like, "Oh my gosh."
01:05:18 Juan Sequeda
I think this is a good thing. Yeah, there is a lot going on-
01:05:23 Tim Gasper
It's totally healthier for you. Instead of reacting to whatever the headline is in the moment, you get to really look for the meat, the good stuff.
01:05:29 Simone Steele
Yeah, yeah, yeah.
01:05:31 Juan Sequeda
Simone, this was a fantastic conversation.
01:05:34 Simone Steele
A bit late though.
01:05:34 Juan Sequeda
Thank you so much. Just a reminder, next week Tim and I will have a short and sweet rant episode because I'm going to be on vacation, and I don't know what my internet situation's going to be. So we're just going to have a quick little break out there. Just a quick reminder, upcoming August 23rd, we have Aaron Wilkerson from Carhartt on data leadership. August 30th we have Ari Kaplan from Databricks and he is the actual Real Moneyball guy. So that's going to be a fun conversation. And then on September 6th we have Alexa Westlake from Okta talking about data value. With that, Simone, thank you, thank you. Thank you so much. And always thanks to data.world-
01:06:10 Simone Steele
Thank you, guys.
01:06:10 Juan Sequeda
... Who lets us dothis every Wednesday. Fantastic conversation. Thank you so much.
01:06:14 Simone Steele
Thank you very much, guys. Take care. Enjoy your holidays.
01:06:17 Juan Sequeda
01:06:17 Tim Gasper
Awesome. Thanks for staying up late with us. Cheers.
01:06:19 Simone Steele