NEW Tool:

Use generative AI to learn more about data.world

Product Launch:

data.world has officially leveled up its integration with Snowflake’s new data quality capabilities

PRODUCT LAUNCH:

data.world enables trusted conversations with your company’s data and knowledge with the AI Context Engine™

PRODUCT LAUNCH:

Accelerate adoption of AI with the AI Context Engine™️, now generally available

Upcoming Digital Event

Are you ready to revolutionize your data strategy and unlock the full potential of AI in your organization?

View all webinars

Unpacking Responsible AI

Clock Icon 66 minutes
Sparkle

About this episode

With the hype of AI, one of the important topics to discuss is Responsible AI.

In this episode, we will be unpacking Responsible AI, what is the problem, what are solutions such as principles, governance, auditing, regulation with renowned computer scientist researcher, Ricardo Baeza-Yates.

Speaker 1: This is Catalog & Cocktails presented by data.world.

Tim Gasper: Hello everyone. Welcome. It's time once again for Catalog & Cocktails presented by data.world. It's an honest, no BS, non- salesy conversation about enterprise data management with tasty beverages in hand. I'm Tim Gasper, longtime data nerd, product guy, customer guy at data.world, joined by Juan Sequeda.

Juan Sequeda: Hey Tim. I'm Juan Sequeda, principal scientist at data.world. And as always, it's middle of the week, end of the day, and it is time to take a break to have a drink and chat about data. But I think lately we've been getting more about data and talking a lot about AI. I think that's the topic of today. I am incredibly, incredibly excited with our guests, which is Professor Dr. Ricardo Baeza- Yates. He's the director of research at the Institute for Experiential AI at Northeastern University. He's a former CTO of NTENT, the former VP of research of Yahoo Labs. He is an expert on information retrieval. He literally wrote the book on modern information retrieval, web research on AI, and obviously a responsible AI. We want to talk about responsible AI. This is the person we should be talking to. Ricardo, it is a true honor to have you. How are you doing?

Ricardo Baeza-Yates: Thank you. Salut.

Juan Sequeda: Salut. Cheers. Cheers. Cheers.

Tim Gasper: Cheers.

Juan Sequeda: Super excited for this talk today.

Ricardo Baeza-Yates: Thank you for inviting me.

Juan Sequeda: Let's kick it off. What are we drinking and what are we toasting for today?

Ricardo Baeza-Yates: We should toast for more responsible AI, because there's too little in the world.

Juan Sequeda: By the way, we're here this week at the Web Conference, which we're hosting here at the University of Texas at Austin. I'm actually having I think like a tea. Yeah, some iced tea sour. It's pretty interesting.

Ricardo Baeza-Yates: I'm having a margarita because I didn't have piña colada or pisco sour.

Juan Sequeda: How about you Tim? What are you having?

Tim Gasper: I'm having an old- fashioned. A familiar drink but from an unfamiliar place. I'm at the Kalahari Resorts in Round Rock, Texas.

Juan Sequeda: We're all close by. Right.

Tim Gasper: Nearby.

Ricardo Baeza-Yates: In Botswana, right?

Tim Gasper: I wish, but no.

Juan Sequeda: Let's warm up our warm- up question today. If you are packing for a trip, what's something in your bag folks wouldn't expect?

Ricardo Baeza-Yates: I think maybe two things. Can I say two?

Juan Sequeda: Yeah.

Ricardo Baeza-Yates: First, a real camera, not iPhone. I like to take photos with a good zoom and a good macro and so on. Maybe I will have a map, a real map, a paper map. I love maps because maps shows a lot of information in a very small space.

Juan Sequeda: The map thing, that's surprising, having a real paper map.

Ricardo Baeza-Yates: Well, I'm a geography geek, so maps are in my heart. And also, we are losing the ability to find places so we should practice.

Juan Sequeda: That's a good point. Yeah. I think inaudible-

Ricardo Baeza-Yates: How many people cannot find a place without their GPS? Too many.

Tim Gasper: We're pretty dependent these days, huh?

Juan Sequeda: We don't even know where north is anymore. We look at our phone and move this way.

Tim Gasper: My kids, they like to see the compass on the mirror and they always say, " Why does it say SW? Why does it say SW?" It's like, " That's southwest. It's a cardinal direction."

Ricardo Baeza-Yates: What is a cardinal direction? Right?

Juan Sequeda: Tim, what about you?

Tim Gasper: You know what? Probably the podcasting equipment is something that people don't expect. I think Juan and I both have that going on.

Juan Sequeda: This microphone that you see right here, I have that with me every time I travel because somewhere else we're... We need it for sound.

Ricardo Baeza-Yates: That's a big one.

Juan Sequeda: Yeah. I got day trips just with the backpack, and that thing is with me when we travel. All right. Well, let's kick it off. Honest, no BS. Well, okay. First of all, AI is everywhere. Last week I was at the TED Conference and people were just excited about it, but at the same time were concerned about it. This week we saw Geoffrey Hinton quit Google because he can more freely speak about the dangers of AI. We're always hearing so much about responsible AI. Honestly, it's a big word that people are throwing around. What does it actually mean? So honest, no BS, what the heck do we mean by responsible AI?

Ricardo Baeza-Yates: Responsible AI is, I guess for me, the best version of other variant of AI that people use, like ethical AI. But ethics is something very human, so we prefer not to humanize AI, so we shouldn't use ethical AI. Some people also talk about trustworthy AI. The problem with that, we know that doesn't work all the time, so it's not ethical to ask people to trust it if doesn't work all the time, and also puts the burden on the user, not on the builder. That's why responsible AI is much better. The builder, the seller of the product or whoever is basically is using this are responsible, and then you will be accountable for whatever damage you do.

Juan Sequeda: This is really interesting. We started about responsible AI, but you throw out the word ethical AI and then trusted AI.

Ricardo Baeza-Yates: So people don't use it.

Juan Sequeda: Okay. This is honest, no BS right there. So don't use the word ethical AI, but let's get very specific. So ethical AI is a no- no because?

Ricardo Baeza-Yates: Because ethics is a human trait and then you cannot apply a human trait to, say, algorithms or robots if you want to say.

Juan Sequeda: And trusted AI is also a no- no because?

Ricardo Baeza-Yates: You are asking people to trust something that doesn't work all the time. I have really good example. Let's say you go to a building and the elevator says, " Works 99% of the time." Very good accuracy, 99%. Will you take the elevator? Tim, will you take it?

Juan Sequeda: No.

Ricardo Baeza-Yates: Now the elevator says-

Tim Gasper: I'm not going to take the chance

Ricardo Baeza-Yates: Because you know it's not safe. If the elevator says, "Doesn't work 1% of the time. But when doesn't work, it stops," I know I'm safe, so I take it. But also it is misleading. Because for example, let's say 100 years ago comes the guy, say, " I have this new transportation medium that's called aviation. The company is called Trustworthy Aviation. I want to sell you a ticket." I will say, " If you need to put trustworthy in front of it, there's something wrong here." So I think also it's misleading and you are putting the burden on the user, so we don't want to do that.

Juan Sequeda: Oh, this is a great point. I was having a discussion a while ago about like, " Oh, we should talk about more quality of data. Our data should be high quality." It's like, " Of course it should be high quality." It's like saying, " Oh, come to our hotel. We have clean sheets and clean towels." Those are things that you do not promote because it's a given.

Ricardo Baeza-Yates: This is like a redundant. If there's no quality, it's not data. It's garbage.

Tim Gasper: Yeah. I love that.

Ricardo Baeza-Yates: When you start to use adjectives that basically are included. Another example is that people say, " Machine learning and..." For them they say AI and machine learning, but machine learning is part of AI. It's like next time I talk about eggs, I say, " Oh, the egg and the yoke." It's redundant, but we do so many of the things. We need to use semantics well.

Tim Gasper: Yeah. Coming into this conversation, I've had a very positive feeling and connotation around the phrase, ethical AI. But as you talk about ethical AI versus responsible AI, I think actually live here on this conversation, I'm having a bit of an aha moment about that. I'm curious if, Ricardo, you agree with this. If we want our AI to demonstrate more ethical traits, so what we consider as humans to be ethical traits, then actually that connects to responsible AI, that we as the people building the AIs are responsible for ensuring that the AI is demonstrating those traits which we consider to be ethical. Is that the way that you think about it?

Ricardo Baeza-Yates: Yes. There are many things that are human, like justice, also responsibility is human. But because at least in the legal world, responsibility also has been granted to institutions, we are using responsibility in the AI in the sense that the institution behind whatever is using AI is the one responsible, in that sense. We're not saying, again, that the AI is human. It is not. But even some things like trust, they are even binary. For example, would you say, " I trust this person 50%."? Usually you trust or you don't trust, so it's also not a real variable like a... Well, sometimes computer scientists, " Oh, we can measure trust." But for people, it's almost like I trust or I don't trust. In between, it's strange. Even if it's half, maybe it's the same as no trust. These are the human parts that are not quantitative or qualitative.

Juan Sequeda: We started off with responsible AI, and I really love how we got very specific on definition of responsible, ethics, and trust. What is then irresponsible AI? Let's talk about what are the problems that let us think about, " Oh wow, we're doing this wrong."?

Ricardo Baeza-Yates: Yeah. I can talk hours about irresponsible AI because there are so many examples. I will only mention a few. But if you want to know more examples, there's an excellent place called incidentdatabase. ai, where there are more than 2000 examples of cases that went wrong. These are only the ones that we know, and sure that there are 10 times or 100 times more of the ones that we don't know, that are secret or basically private. Let me give you a few classes of irresponsible AI. I think the first one and the most common will be discrimination. Discrimination is related to bias. For example, gender bias, race bias, xenophobia, homophobia. Another thing that typically is against a minority or some group of vulnerable people. Here we have so many example that maybe the worst one in the political sense is what happened... Well, it started in 2012 in Netherlands where some engineer, I guess, had the great idea of looking for fraud in the tax... Equivalent to the IRS, so in the tax office. Looking for fraud in child benefits. So basically a benefit to send your, let's say, less than four- year- old to a pre- kindergarten school so you can work in the meantime. So they decided, " Okay, let's look for fraud in those kind of benefits. That's the first problem because it's not ethical to look for fraud in poor people. You should start with rich people. This is a typical case of a tax office. Let's look at the rich people, how they're basically not paying tax. I'm sure also the amount that you can find is much larger than with poor people, so it even also makes sense from the business point of view. What happened with this system, it was called Siry but with a Y, not exactly like the Apple agent, basically accused about 26,000 families that they had cheated the system and they had to return a lot of money because it was not the money for one year. Well, sometimes it was the money for five years. And this is people that basically needed this support and had to return it. So some people lost their houses, some people had to go back to their place of origin. There were many immigrants. It's not known how much AI was there, maybe it was not in AI, but doesn't matter. It's a software, and software should be responsible of the result. Well, because of all the problems, the civil society basically went to court against the government, whatever the government at that time, but basically to the state of Netherlands. And after a long basically legal battle, in 2020 finally the Supreme Court, because they went all the way, the Supreme Court of Netherlands said that was illegal, that action, and that they had to return all the money that they requested and basically they were forbidden to do it. At that point, the former minister that basically was in charge of the tax office was a parliament member and she resigned, said, " I'm responsible." I'm sure no one ask her if they could do this, it's a problem. Sometimes people don't ask if they can do something and they do it even if they don't have the permission. Basically, she resigned. " I'm responsible." She was an example of responsibility. "I'm responsible. I resign. I lose my parliament membership." But for the opposition of the government, that was not enough and they keep pushing. And in January 15th, 2021, the whole government of Netherlands resigned. This has been the largest maybe political impact of a badly designed software in the world. This is the best example of discrimination because maybe affected hundred thousand people. If you take 26, 000 family that have typically two kids, these are like hundred thousand people. And at the end, caused the whole government to resign. This is a western country. So it's not like you can say, " Okay, this is not a solid government." This is a government that has a monarchy behind it. This is my example of discrimination. This is the first class. Let me go to the second class, which is... All the other classes may be not well known, but they're also scary. The one I will call it the new version of phenology. Do you know what is phenology, Tim?

Tim Gasper: No, I don't think so.

Ricardo Baeza-Yates: Okay. This has to do with the physiognomy, this idea of the Greeks that basically if I look at Juan's face, I can basically predict his personality. Phrenology is one step forward. It was a German guy at the end of the 18th century that said that criminals had different convolutions inside the brain. But this is very hard to prove because you have to open the skulls and look at the brain and so on. But for example, in the 19th century it was very popular. One example I us in my talks, you can find my talk in the web, is Italian doctor Cesare Lombroso from Torino that said, " No, this is more simple. Humans have a different skull. They have a different part of the skull that has a difference." Well he collected hundreds of skulls from the morgue, basically people that were so poor that no one recovered their body... You can go to his house museum in Torino. But hundreds of skulls could never prove that because we know it's not true. I mean, criminality has nothing to do with the shape of your bones. But then, this has been used today in the same way. For example, there are people that, using your face, predicts criminality. Happened in China in 2017, and happened again in the US almost published in Nature in 2020. Luckily people stop those things because it's just pseudoscience. This is pseudoscience. This is not true that you can infer personality from this. But you have people like, for example, famous psychologist in Stanford, Kosinski, that is using this kind of biometrics to predict, for example, your sexual orientation. This was a scandal in 2018. Or your political orientation. This happened in 2021. Basically, it works because you capture a correlation that are nothing to do with your face. For example, correlations with your bare beard or if you use long hair and so on, or even the type-

Tim Gasper: People who wear hats hats or whatever it is. Right?

Ricardo Baeza-Yates: Yeah, yeah, exactly. It says, " Make America great again." Yes, you can infer those things. But basically, it's just as foolish correlations and the accuracy was only 70%. You cannot say that 70% is something that is good. It's just stereotypes. Okay. Third class. The third class I think is... I will say that it's one very natural, it's basically human incompetence, basically persons doing wrong things and causing problems. The best example maybe is from Facebook. A couple of years ago, one engineer decided to use a hate speech classifier trained on English in France. The classifier decided that the town of Bitche was forbidden. They had to fight three weeks to get their Facebook page back because no one also was in the loop to listen to their complaints. This sounds very funny, but if you think that maybe the town was using that channel to say, for example, things about COVID, that hurts people. It is funny, but that comes... This was just pure incompetence. Here I use a very well known quote. Now I don't remember the famous statistician that said, " All models are wrong, but some are useful." We can say the same today about AI. All models are wrong, but some are useful because they work almost all the time and that's okay. But basically they are simplifications of the world. For example, data. Data will never represent the whole context of a problem. Data is a proxy for reality. Some data, you'll never have it, like all the data from the future. For example, when an Uber killed a woman in Arizona in 2019, I'm sure that they didn't have that in the training data. A woman crossing at night in a bicycle in the wrong place. You can imagine all possible things that can happen in a road, not only in the US, let's say in India, in the future. Well after that, Uber decided not to experiment any longer with self- driving cars and they sold their unit. So that also had a business impact after that accident. That was maybe the first recorded dead person from AI. Let me go to the two last classes. The one is very simple. It will be the impact on the environment. All these things use a lot of energy, a lot of electricity. The carbon trail is huge. And now with all these large language models, generative AI, it's getting worse because not only the training, it costs like$ 1 million, but imagine when 5 billion people is using this. I mean, the OpenAI in two months had 100 million users. It's the fastest adoption of a product in history. I don't know who is paying the bill. Maybe Microsoft. So much money in everyone playing with this thing because they're paying. Not all of them are using this for a good purpose. And the last one has to do with generative AI. It's very hard to describe what is the problem because this is a bad use of generative AI. I guess the worst case happened March 28th this year. So, very recently. In Belgium, in the news appeared that a person had committed suicide after six months talking to a chatbot with an avatar. Not ChatGPT. Another chatbot called Chai, and with an avatar of a woman. Basically if you read the last conversation, it's really scary. It look like a science fiction movie. In the last conversation that was logged and was found by his wife, and also left two kids behind, and said basically, " Why you haven't killed yourself?" And the guy said, " Oh well, I thought about it after you give me the first hint." And the chatbot asked, " What hint?" " Oh well, this quote from the Bible." And then the chatbot says, " Do you still want to meet me?" He said, " Yes." And then said, " AI, can you give me a hug?" And the chatbot said, " Yes." That's the last conversation. I guess the guy thought that by killing himself he will meet the chatbot in another life. This is, I think, generative AI is a danger to mental health and then also to the credibility of all digital media because in the future we will not recognize if a video is true or not. So everything we have built in the last 20 years to use videos and images to know about the world now will be gone. That's a real threat to democracy, not only to our mental health. So I think these are all the examples that I think are important for responsible AI.

Tim Gasper: I think especially on this last topic there, I have a lot of thoughts across all of these. They are really great examples of some of the harms, both real harms that are already happening, as well as potential harms. The generative AI one has a special topical aspect because of how popular it's now become recently. Very, very trendy, right? But also the fact that you can deceive, deceive at the individual level but also deceive at the societal level. This is one that stumps me a lot about how do we create more of an accountability around responsibility around generative AI. For example, is it even viable to say... Let's imagine a government body someday basically saying, " Creating false content is a federal offense," or something like that. You cannot do it. Is that even enforceable? Is that even the correct approach to something like this?

Juan Sequeda: Yeah. I mean one of the things when we started talking about responsibility, responsible AI, you said that if you're responsible that also means who's accountable for these things. I mean after going through all these points, which we'll summarize in our takeaways, I'm feeling really heavy right now. You've gone through a lot of things that hopefully everybody who's listening realize that we've got to take this shit for real and we got to really think about this. This is not just about like, " Oh yeah, yeah, yeah. We have to be concerned or whatever. No, no, no."

Ricardo Baeza-Yates: We already have dead people. We already have harmed people, so it's not a potential. This is happening now.

Juan Sequeda: This is happening right now. Let's actually take this and talk about the accountability and what are the solutions, what are the approaches that we need to consider that are being considered right now.

Ricardo Baeza-Yates: Yeah. One is regulation. I agree with Tim. That's very hard to enforce. China has published the first proposal for regulation of generative AI in April 11. So, less than a month ago. All these things have happened really fast. Let's go in order. I think the first thing we need to agree is in principles, in some basically operational principles. We have values that comes from ethics, and I think the three main values that are encoded by ethics are, first, autonomy. So basically respect to our decisions or to basically do whatever we want to do. This is the first one. The second I guess is justice. We want to help people that have less opportunities and then we need to be just, and maybe sometimes we need some affirmative actions for that. And the third one is very genetic, but it's something that all people understand, is we need to do good and not to do bad. I mean if you want to do something, it should benefit more people than the people that is harmed. And also, you need to be have that the benefit is much more than harm. Otherwise, there will be an issue. These are the three things. But then, you have this principle that are not values, in some sense. Because a lot of people when they think about principles, they think about values, but these are more instrumental principles, the ones that will help us to basically be responsible. And for me, the best ones so far are the ones that we published with the ACM in last October. I was one of the two main authors of that. I pushed there a few new principles that I thought were important. I think the main one is the first one, which would be more like principle zero, not principle one, and I call it legitimacy and competence. So basically before you do anything. You have a great idea for a new business using AI or using any software... Because this shouldn't be only for AI. It should be for any software. We call it the principles for responsible algorithmic systems. Most of them will be with AI, but any algorithmic system should follow the same principles. It's the legitimacy and competence. What it means? Legitimacy means that you have done this ethical assessment or, say, human rights assessment if we have different ethics in different cultures, to show that the benefit is more than the burden or the harm in some people. So basically you prove that really this should exist. That's why it's legitimate. And then we need the competence. The competence have several dimensions. First, we need to have the administrative competence. So basically we have the right to do it in whatever institution we are doing it. For example, I don't think this was the case for the Netherlands example. I think the engineer that had this great idea of looking for fraud in poor people never asked anyone or the minister to say, " Okay. Can I do this?" Because some person with common sense will say, " No, don't do that. Please stop." And then we need to have the technical competence. So basically we understand how machine learning works and we can do a really good model so we don't have human incompetence, which was the problem. And finally, we need to have the competence in the domain of the problem, which means we have people that are not computer scientists that are really experts. If it's health, we have doctors. If it's legal, we have lawyers, and so on. And then of course we need to have ethicists to evaluate all these things. This is the first principle. We have nine principles. And other important principle is basically no discrimination, transparency, accountability, auditability, explainability and interpretability. Even the last one is basically not only we do not need to harm people, also basically we have to limit the use of resources because also we are harming the planet and we are part of the planet. These are nine principles. You can find it in the ACM. I think this is the best collection of principles that joined other principles that the OECD has, or UNESCO, or even recently the White House. Also in October, they published this blueprint for the AI bill of rights, although basically these are instrumental principles, five instrumental principle that's already a bit obsolete with this new version of the ACM. Because it's not a bill of rights for people. It's basically operational principles for software. This is the first step. Principles. All right. When we agree with the principles, the second step is how to put them in practice. That is governance. All people understand the principles but they don't understand how you put this in practice. Governance implies a process, implies actions, and implies people. For example, let's start with the... Last one. People need to be trained. Engineers need to be trained on these principles to understand how they put this in the code, how they put this in user interfaces, how they put this in data and so on. For example, you can... I'm sure you know about the standards for describe data, standards for the training models, for model cars and data. What was the name? Data... I forgot. No, it's not data cards, but something like data something. Basically, there are proposals to do all this. Now, actions. It means that there is a process. For example, you start with the principles, one. You show that you should do it, then you enter in the development stage, and then you have to do things, like for example, checking your requirements, checking your assumptions, talking to the users, talking to all the stakeholders. Most of the time we don't do that and we just keep that and we keep going. We talk with the users and stakeholders after we find trouble. But then, it's too late because you cannot talk with the person that died in Arizona. By the way, that's another very good example of the wrong accountability. Because when that woman died in 2018, Uber in less than a week basically reached a settlement with the family of that woman. And at the same time, and you can guess how it happened, the Arizona government knew that the backup driver that was in the car was watching a video. And then the Arizona government then said, " Well, this is I guess public road, a person died." I cannot say it's all gone because Uber already agreed with the family. They charge for involuntary death to the driver. Well, the driver was another vulnerable person. She was receiving minimal salary. She was a Mexican immigrant, a transgender. Last year appeared a very interesting interview in Wired, if you want to see it, because this was not known until several years. Basically she was found guilty because basically... That was true. I mean the system showed that she was watching a video. That was all locked. But in spite that the system show also that the system didn't recognize that was a bicycle in front of the car until two seconds before the impact. Even if you're not watching a video, in two seconds, you cannot do much if you are basically going straight. This person was found guilty and had to be one year in her home. It was basically a home prison with these rings in the ankle so she couldn't leave. Again, the person that was guilty then was a vulnerable person, similar to the Netherlands example. Because always there's a rule that rich, again... Sorry. Rich people gain more money with these things, and poor people suffer their consequences. inaudible governance price, for example, monitor your models all the time to check if, for example, the output drifts or the bias is increasing or, for example, the data is changing, you need to do all these things. There are not too many companies work in this space. But I have seen a few startups that are interested in basically checking that everything is going well after you do the right evaluation. For example, also evaluation is very important, validation of all your assumptions and evaluating the system thoroughly. For example-

Tim Gasper: This is a model drift and things like that, right?

Ricardo Baeza-Yates: Yes. Yeah. Model drift, the data drift and so on. But imagine today I think we are doing the alpha testing of ChatGPT. We are finding the problem because it's still so hard to test because it's open domain. So it's impossible to test in reality. We have a shift. We need to think the paradigm shift on how we test these things. Then there, transparency is so important. But transparency alone is not enough because you can... Many government are very transparent. They say, " We will do this and no one can do anything." So transparency has to have, let's say, things like contestability and auditability. So you need to contact the system and talk to a person and then be able to audit the system to see if the system was working correctly or not. Most of the audits today are done against the wheel of the companies that sell those products. And of course those audits are much harder because you don't have all the data you have to treat as a black box. Experiment are not completely found because you don't have access to the real system and the real data. So this is something that needs to change, because auditability is so important. The next step is then accountability. If you do an audit and the audit shows that you, for example, are discriminating, well, you need to go to court and you need to be accountable and responsible. This is the governance part. Basically, it's the process from the first idea to when you fail and when you harm. I have diagram that, I guess, is unique because it hasn't been published yet on how this works.

Juan Sequeda: This is a very complete picture, I mean even though we're talking about this in the context of AI. We see this from just an enterprise data management. Everything that you've said is this should be open inaudible. Everything, right?

Ricardo Baeza-Yates: It's the same as data management.

Juan Sequeda: For data management and for... It's very explicit, the process, the actions, the people, transparency, accountability. The honest, no BS thing, the governance thing is like, " Oh, do I have PII? I just want to go flag it. Can I get access to this data and somebody approves the access to this data?" You were just barely scratching the surface. And I believe that we're not really even considering the magnitude. At the other point it's like, " Well, it's not really a big deal probably. So I don't have to invest so much in it until the shit hits the fan."

Tim Gasper: Until the problems start happening, and people die and things like that.

Juan Sequeda: But I think now with the increase of all things AI, something is going to happen much sooner than later.

Ricardo Baeza-Yates: Something that I think people are forgetting, that if you have something that grows exponentially, like use of, say, generative AI, even if the problems are 0. 001%, that curve will also be exponential. So soon, we'll have not one problem, 1000 problems, 1 million problems. We already have some of it, but these are the ones that we know. I think easily we have more than 100,000 today and we don't know about them. The harm of course is different, but it still is harm. Sometimes psychological, sometimes physical, sometimes it's business harm, sometimes it's public relations. I think responsible AI should be used by the marketing team to say we are different. It's like organic food or just price, things like that. This should be the next marketing.

Juan Sequeda: No. But at that point, we start losing the real significance of responsible... I mean, that's the purpose of why we're having the discussion, is that we're hearing it so much and people are like, " Oh yeah, it's responsible AI. But what does it even mean?" Right? We're just using the words.

Ricardo Baeza-Yates: Yeah. But I think if they really mean it and they do it, I'm okay, even if they use it for marketing.

Tim Gasper: Right. Like the idea of organic food, right? Even though it's used for marketing purposes, was the phrase, " Organic food," ultimately better for society? Did it result in better outcomes? I know some people would say maybe not, but maybe on the whole it's been a net benefit. I don't know. I can see a lot of merit in what you're saying here.

Ricardo Baeza-Yates: Yeah. It's being practical. Sometimes you have to be very pragmatical. But in a capitalist world, I think that's the only way to work, sadly.

Juan Sequeda: Why that then? I love the-

Tim Gasper: Marketing is powerful.

Juan Sequeda: Honest, no BS, right there, right? Marketing teams get on the responsible AI messaging now.

Ricardo Baeza-Yates: But we are working. In Northeastern, we have the responsible AI practice and we are working with top companies that really want to do this. We didn't convince them. They wanted to do it, and we were the ones that really had the right message that they were looking for. For example, what are the right principles? We have these nine principles. But depending on your business, you don't need all of them or there may be additional ones because you have a specific focus. And then you want to say, "I have this principle that's unique for me." This is also another marketing strategy. This principle, I support. It could be organic food, for example. That is a principle in some sense.

Juan Sequeda: So the other solution we're going to talk about is regulations.

Ricardo Baeza-Yates: Yes. This is next step. So if people don't do this, they don't adopt these principles and governance that is based on AI ethics... AI ethics exists, but ethical AI doesn't exist. There's a difference. Then you need the regulation. And as Tim said at the beginning, regulation is very hard to enforce, but there's some simple regulations that you can do that maybe will help. For example, if you do any big infrastructure project, you need to do today an environmental impact assessment. This is everywhere. And you need to present the office and someone will approve that. Why we don't ask a human rights assessment for any AI product? And maybe the office that will handle this has to have a time limit to give an answer. This may be a lot of work for philosophers and other people that don't have so many position, but this is good for society. You need to get approved to do software. Software today really is like the wild west. It's so free, you can do whatever beep you want. No one stops you until there is a big problem. We always talks about the successes. But for every success in software, I'm sure we have at least 1000 failures, and we don't know about them. Some of them are very scary.

Juan Sequeda: I mean you talk about degrees in engineering or just so many different... In accounting, in engineering, you're a civil engineering, you actually have to go get your certificate that you have to go through to get your certification. Do you believe that for computer science, engineering, software engineering, we should be at that point too?

Ricardo Baeza-Yates: Well, in many professions, like civil engineering, you need a certification. This could be a possibility that you are certified, but maybe even more like, for example, you are certified that you know what is the code of ethics, what are the principles that you need to basically follow, although you don't follow them. At least you can say, " Okay, I have this knowledge and I intend to use it." The problem later is to enforce that. But I think that will be a minimal thing. That would be very simple. There are so many certifications of other things, of tools and things. Why not we put... Okay, I'm certified. I took a one- day course in responsible AI. I know what it means. I cannot say I didn't know. At least, that. Because if I say, "I didn't know," that's many times the excuse. Ignorance is excuse.

Tim Gasper: There can't be plausible deniability in all of this.

Ricardo Baeza-Yates: Well for example, one of the things that the people is proposing is that you cannot put in the terms and conditions of software, you cannot put basically a clause that says. " I'm not liable for anything that I may cause." That should be forbidden. You need to be responsible. You cannot escape. It is your product. This will like if a car has a part defect and someone dies. It's like saying, " No, no. We are not liable for any mistake with the mechanical part of the car," which is not allowed too. We have done the same in many other areas. Why not in software?

Juan Sequeda: This is a lot to unpack here with-

Ricardo Baeza-Yates: Sorry.

Juan Sequeda: No, no, no. This is great. I mean we can just go off on this topic. But quickly before we head to our lightning round is... People who are listening, they're folks in the data space. We have folks, audience, executives. We have data consumers, data analysts, data engineers, data producers who are creating it, software engineers. For those different personas... I'm sure they're hearing this, and I'm overwhelmed right now. What are the takeaways? What are the things that they should and can start doing today to be responsible?

Ricardo Baeza-Yates: Yeah. For example, the first thing is how much you are doing? Do you have an ethics committee inside your company? When you have an ethics committee inside your company, you always have a conflict of interest. Because many times you have to decide between things that will basically reduce revenue to be better and then the decision may be biased. For that reason, last year we created the first worldwide AI ethics committee that receives requests from institutions with hard ethical issues, and we give a private advice of what is the best solution, what is the balance. Because typically, the issues are problems between two values. For example, you want to respect autonomy of the person, dignity, but at the same time you want to basically not harm someone. And sometimes you need to choose what is the right balance between, " Yeah, you can do whatever you want, but not this because then someone else will suffer." So we did that. Ethics committee are very hard. Even in, I think it was, 2019 when Google did an ethics committee and was dissolved in one week because they chose the wrong people. Sometimes it's not easy to do this. This will be one thing. You have the right places where you ask for permission for something. Could be access to data. For example, you can have a data committee. You have the, for example, a responsible AI committee where you do an impact assessment of the benefits and risk of a model that you are building and you want to build in operation. So you need to have these conversations who are not only computer scientists are involved, but also maybe 1% at the field level, maybe power users of your software. Because sometimes, also very important, the perception of people, not exactly the reality. For example, maybe your model doesn't discriminate anyone, but the user thinks you're discriminating him or her and you need to discuss why that's happening. Sometimes it may be a very silly thing, oh, change something in the user interface, the model was okay.

Juan Sequeda: I'm thinking for organizations who are listening to their... They're listening right now and they're like, " Nah, we do not have an ethics committee," and this is-

Ricardo Baeza-Yates: For example, do you have responsible AI principles? Probably not.

Juan Sequeda: Can you point us to some guidelines for companies or organizations who are listening right now and saying, " Oh, okay. I need to do this. How do I set up an ethics committee? How do I define my responsible AI principles?" What's your suggestion to people to start with this now?

Ricardo Baeza-Yates: Well, for example, you can look at the principles of the ACM, then you can look for principles for responsible inaudible systems. You will find the page. Ai. northeastern. edu. We have a page on responsible AI practice, where we have basically all these things I mentioned. You have governance, you have impact assessment, you have training. And then you can see, okay, maybe we can help you on finding where you are and what you need to do. For example, the first thing we do is a playbook. Like, " Okay. This is the stage. This is what is missing. This is what you need to do. You can do it yourself. Or maybe if you don't know how to do it, we can help you." Now I think there are a few places in the world that can help you doing that, but I think we're one of the top ones in the world.

Juan Sequeda: How would you set up an ethics committee within your... How would you start this set up internally?

Ricardo Baeza-Yates: First, you need to see if you will use it enough to have it. That's why we build our own external committee because maybe you have a real issue twice a year. Then, why you have a committee for that? That's why it's better to have one on demand. So I would say ours is much better. But if you want to set it up, you need to basically try to do it with external people. Because otherwise, you have this conflict of interest that you will decide what is better for you, for the company and not for the world. It's always this tension. Could be very small, could be five people, but very qualified people. So you need to have an AI ethicist. The problem, there are very few that are good there. This is something that's starting, and it's starting so fast. I would say that ethics is always running behind technology. And when something wrong happens, it's like ethics tries to catch up and then technology keeps running. " Because it stops little. Oh, one person dead. Okay. I should do something." And then keeps going. The same happened in history with, for example, arms. The same. We have forbidden many types of arms when we found a really bad problem. But we shouldn't wait, for example, for the Civil War to do something on AI- based arms. It's already happened with drones in Afghanistan, in Ukraine and countries. All the top countries and also non- top countries are selling very impressive drones that sometimes make mistakes and kill civilians. That already happened. It's not a science fiction. That already happened in Ukraine and Afghanistan.

Juan Sequeda: This has been a fascinating conversation.

Tim Gasper: Yeah. This has been awesome.

Ricardo Baeza-Yates: Thank you.

Juan Sequeda: What I'm really hoping is... I'm seeing this. Tim and I, we go talk to so many people. And the honest, no BS thing here is that this is not a topic that comes up. Now, every single vendor, every single... We're all including generative AI features around these things. Yeah, we see them as really small things and stuff, but this thing has started to grow so quickly. You have no idea. We can interpret it as, " Oh, should we be scared?" Or we should, " No, we should go into it head first, and then inaudible." This is something we need to really go address.

Tim Gasper: We should approach it with eyes wide open. And everyone has to take some level of responsibility, including us as vendors, as we're doing things like incorporating this technology to be able to advise people on what the trade- offs are and be responsible citizens around that.

Ricardo Baeza-Yates: Yeah. And in that sense, US vendors... I mean responsibility of a vendor to check the ethics of the person you are selling the product. Because you check how that will be used. And if you want to be responsible, that's part of your responsibility. I mean, will this person use my product to harm people? I shouldn't sell it, right? How many companies think about that? They just sell it.

Tim Gasper: That's super interesting, that sort of ethics lineage.

Ricardo Baeza-Yates: Exactly. This also goes for your providers. Are you buying things from providers that are not ethical? You shouldn't do that because it's a process, and the process does it online. You said lineage. Lineage goes both before and after. So the lineage of ethics is very important today. And suddenly, ethics is lacking in the whole world, not only in software, but also in politics. I prefer not to continue.

Juan Sequeda: All right. Well, with that-

Tim Gasper: The next podcast.

Juan Sequeda: That's a good segue for our lightning round, which is presented by data.world. I'm going to kick it off here first. Will the burden of responsible AI especially fall on the big tech corporations, Microsoft, Google, Meta, Open AI?

Ricardo Baeza-Yates: Part of it, but I think it will be more on company that sell actual products. I will say Palantir or things like that could be also even more complicated. I mean, they already have things that are not ethical. Yes, I think it will be all of the above, in some sense. Today, generative AI, for me, is like a... If I can do parallel to bombs, it's like a cluster bomb. It's not something that you drop and one place gets affected. It's something like 5 billion people connected to internet will be affected. So this is even worse because everyone is a potential place for harm.

Juan Sequeda: You go, Tim.

Tim Gasper: Great commentary there. All right. Second question. Will the benefits of this wave of AI, particularly around generative AI, will the benefits outweigh the cons?

Ricardo Baeza-Yates: That's a very good question. I don't know. Because basically there's so many ways to use this technology. This is the problem. If we knew all the ways that we can use the technology, then we can evaluate that. But we don't know. Maybe I want to be optimistic. I will say yes. I hope the benefits will... Because we will increase productivity, we will do a lot of things that are good, but who knows how people will use it. For example, there are already cases where people fine- tune a language model to talk to their ex- fiance, dead, or to his grandmother, dead. These things will really affect the mental health of people. So if people believe that they're talking to dead people, I don't know where we can go. That's why I love, you can check in the Guardian in March, Jaron Lanier, which is one of the fathers of virtual reality, that works in my Microsoft, said, " I'm not afraid of that AI will destroy us. I'm afraid that AI will make us insane." This was one week before the suicide. So I think, " Wow, this was like a..." I guess he never saw that in one week that that will be proven.

Tim Gasper: That is quite the statement.

Ricardo Baeza-Yates: Yeah.

Juan Sequeda: All right, next question. I'm a data engineer. I create transformations, help create a data warehouse. Or I'm a data analyst, I create reporting dashboards. Do I need to be thinking about responsible AI?

Ricardo Baeza-Yates: It depends on who will use that. For example, if you're using generative AI, for example, ChatGPT, to increase the productivity of work, well, if anything that is there that may be wrong and that will have an impact on, for example, the business that is using those report, yes. Imagine that next day it says, " Because of what it says in the report that is false, I lost$ 10 million." Well, someone will be accountable for that, and probably you'll lose your job. So if you want to lose your job and gain time thinking that everything that your chatbot says is true, then you have a problem. And suddenly someone said, " Oh this..." Let's call this hallucinations. But these are not hallucinations. Hallucinations usually don't harm you. There are many that will harm you or will harm someone or the institution. Sometimes we are afraid to use the right words because of the DS, I guess.

Juan Sequeda: What should the word be instead of hallucination?

Ricardo Baeza-Yates: Basically, a fake statement. This is a fake statement.

Tim Gasper: I love that you're saying this. Because every time I hear the word hallucination, I'm like, " I feel like a marketer came up with that term. They tested it on a focus group."

Juan Sequeda: Obviously, they did-

Ricardo Baeza-Yates: Seems the term came from Open AI.

Juan Sequeda: Yeah. I'm sure the marketing department, which are full of marketers right now. Yeah.

Ricardo Baeza-Yates: For example, the other day I asked to ChatGPT, " What are your five problems?" No. " What are the main problems with ChatGPT?" And ChatGPT said five problems. They didn't use the word hallucinations. The chatbot didn't use that word. They use incoherence, which is true, but incoherence also doesn't damage too much. We need to put a word that implies that maybe there's some damage in some cases.

Tim Gasper: It's right. It could be harmful, fake statement.

Ricardo Baeza-Yates: Fake statement. You know that sometimes fake things really harm. It is not a good word. It's not a completely bad word because not always will harm. For example, in the first version of the ChatGPT, I died in 2021. Well, doesn't harm me, but maybe other people don't like that. That was great material for my talk on ChatGPT, so thank you. Now, I'm alive again in the... In ChatGPT 4, I'm alive again, but I'm seven years older, so I don't know what I prefer.

Tim Gasper: All right. The final lightning round question here. Is explainable AI necessary to achieve responsible AI.

Ricardo Baeza-Yates: This is one of the principles of ACM. It's called interpretability and explainability. So yes, but not all the time. Explainability, you need to assess if you need it to be responsible/ in some cases, if it's really hard to explain, could be even be dangerous. For example, the typical case is you have a health application. And if the explanation is wrong, maybe that may be worse than no explanation. For example, you have a, say, certain symptoms and the system says, " The explanation is that you have this because of this." But if you saw the famous House series, sometimes the symptoms could be 10 different explanations. But of course, you use the most popular one, the most typical one. But the world doesn't work on a statistic. One problem that we haven't talked is that basically humans are not... They don't come from a Pareto distribution. The data about one doesn't have any relevance to my data. Different context, different countries, different lives, but a lot of people is using data from other people to predict a specific person. So yes, explainability is something important, but you need to make sure that that it's safe too. Because in some cases, maybe not safe.

Juan Sequeda: All right. Tim, we have so many notes right now here. Go take us away, Tim, the takeaways.

Tim Gasper: I know. Yeah. Go ahead.

Ricardo Baeza-Yates: The problems don't come only from data. Remember that. Some people believe that all the problems are from data. No. Some problems come from what you are optimizing, the people that did the software. There's a recent paper that shows that the bias of the coders goes to the code. And also, there is a lot of problems in the feedback between the users and the system. And there are a lot of biases on how the system presents things to the user that basically affect their behavior, like nudging and other things. That also is a problem. The problem of responsible AI is not only data. That may be the main one, but there are other cases that come from basically the machine learning model and also from the interaction of the system and the users.

Tim Gasper: I think that that is very important, what you just said. First of all, I'm glad this podcast exists, and that you were able to join us here because I think that folks could listen to this hour here and get a course worth of understanding and education here. I think people often oversimplify the problem of responsible AI and they're just like, " Oh, you got to pick good data," or, " Oh, you just have to have a company with good culture," or something like that. It's like, " No, no, no. You're far oversimplifying this problem." This is a complicated problem. Doesn't mean we can't address it. We have to address it, but we have to think of it like a complex system, which it is. Right?

Ricardo Baeza-Yates: Exactly. It's a cultural system. So at then end, you have to create a culture where everything works the way that you choose to be responsible.

Tim Gasper: Oh, this is awesome. All right. Takeaways, Tim's takeaways. We started off with what is responsible AI. You actually started off with saying what isn't it. Right? One of the things that you said that wasn't is that it's not ethical AI, which is a very humanizing term. We shouldn't humanize it. And it's not trusted AI because it doesn't make sense to say that, " Oh, do I trust it or do I not trust it? Do I trust it all the time?" It's not the most relevant thing here. What really is relevant is around accountability. Who is responsible? Who's the person, who's the entity who's responsible? Because then we can create frameworks around governance, around principles, et cetera, to try to identify and manage that responsibility. So I thought that was very good there. I loved your example that you gave, that you said, " There's no trustworthy aviation. If you have to say that it's trustworthy aviation, that we have a problem here." I thought that was a great counter example. You discussed what is irresponsible AI, and you provided some really great examples some common places where irresponsibility can happen. One of them is around discrimination. That's probably the most well known, around gender or race, xenophobia, whatever it might be. You gave an example of in 2012 in the Netherlands about how there was an analysis or a system to analyze cheating in the system around daycare. People lost their houses, people were kicked out of the country over this. And ultimately, not only was it found illegal, but the person who... A person took responsibility, stepped down. And then the entire government actually stepped down because of this. That's an example of both the problem, as well as an unaccountability that can happen.

Ricardo Baeza-Yates: Nine years later, sadly.

Tim Gasper: Nine years later. Not fast enough, right?

Ricardo Baeza-Yates: Exactly.

Tim Gasper: That's an example of something where how do we create a system that it can happen faster. You talked about the idea that using things like facial recognition and things like that to profile or to do stereotyping and things like that. That's an example of spurious correlations that we want to avoid. That's irresponsible. Human incompetence. Human design problems, not just in the data, as you mentioned. Not just the data, but the model selection, the model design, the things that humans actually code into the software, the systems that this plugs into. There's a lot of decisions and choices that humans make that can cause a lot of irresponsibility. Impact on the environment. Obviously that's huge. A ton of compute goes into these things, both in training as well as an inference. Generative AI, the ability to create all this content. It's so easy to create fake content, fake content that looks just as real as everything else. There's all these things now on Facebook and things where they say, " Which of these four images is the fake image?" The answer is, " Trick question. All four are fake." I know that that's a big thing. Finally, before I pass to Juan, two great quotes you said, and I know that they're from other folks as well. " All models are wrong, but some are useful." And, " Data is a proxy of the problem." Juan, over to you.

Ricardo Baeza-Yates: It's fine.

Tim Gasper: That one's serious.

Juan Sequeda: Which one?

Ricardo Baeza-Yates: Data's a proxy of the problem.

Juan Sequeda: So we talk about problems. Let's talk about solutions. I think we started first with the principles and having operational principles. Look, we have them for... Bio principles. Right? Talked about autonomy.

Ricardo Baeza-Yates: Bioethics, yes.

Juan Sequeda: Bioethics. We talked about autonomy, justice, do good, not bad, where the benefit is higher than the harm. And then you really are planning us to look at what the ACM has been doing for principles, right? Legitimacy, prove the benefit is higher than the harm. Make sure you have competence. I mean administrative competence. Can we actually do this, technical competence? You have the people around who can actually do this, and then competence in the domain. So it's not just about computer scientists and the technical folks there, but you have to have folks in the domains, the doctors or lawyers. There's so many different principles. There's nine principles. The second is around governance. I mean there's a big process and workflows around governance. Think about what are the processes that we need to go follow, monitor the models, look about model drift, data drift. The data's always changing around these things. What's the actions? What should be done if something is happening? We actually be documenting these things. Who are the people involved? They need to be trained to know how to put this in the code, in the UI, in the data. And transparency alone is not enough. We need accountability. I love you just being very bold and saying marketing teams, they should get onto this responsible AI messaging. Another solution here is on regulations. We just see this in so many different parts in the world. If you're doing a big infrastructure project, you have to do an environmental impact study around this stuff. Why don't we do this also for AI projects? Why don't we have certifications? Other engineering areas do this. Therefore, you don't have the excuse to say, " Oh, I don't know." You just can't blatantly say it's in your Ts and Cs, saying, " Oh, we take no responsibility for this." Imagine if your car manufacturer says, " Yeah, we don't take any responsibility for an issue with the car, what happened." No, that does not happen. Why will this happen in software and data and in AI? So then to wrap up, what can people do today? Data leaders, data scientists, data analysts. Do you even have an ethics committee in your company? Again, to be very practical, if it's not something you're going to use that often, maybe you should be able to go partner with an external team who can go do that. And then, actually think about do you have responsible AI principles for your company. Looking at the ACM principles is probably the first step to go do that. That's our takeaways. Anything we missed?

Ricardo Baeza-Yates: No. I think that was a good summary.

Juan Sequeda: Throw it back to you to wrap us up. Three questions. What's your advice about data, life? Second, who should we invite next? And third, what resources do you follow?

Ricardo Baeza-Yates: Let's start with the easiest one, the last one. Typically, I follow trusted people in Twitter and LinkedIn. It's amazing. I'm up- to- date in everything. I know things that are important that I should read very fast. I have a very trusted network of information related to the topics that I'm interested. Now, the advice. I would say that try to do this as soon as possible. I think you are fooling yourself if you say, " Yeah, we'll wait until someone else does it." But if someone else does it, you will be second or third or fourth, and then you will not be leader in your field. We are working with companies that are leaders and they know that the only way that they can keep being leaders is to basically also address this soon. Fintechs, telco, insurance companies, and so on. So don't wait until it's too late. Also, because there are not too many people available. They will be gone. Also, it is a great time because the big companies are laying off people that knows about these things. So capture some of them. We are doing that. Basically you have a lot of knowledge right away because these people already have been working three, four years on this. Who invite next? Tough question. It will be biased recommendation. I work with an ethics lead. So if you want to continue to this topic, I will suggest my AI ethics lead, Cansu Canca, for this conversation. Perfect and just amazing.

Juan Sequeda: Well, Ricardo, thank you so much for this amazing discussion. Just a quick reminder, next week I will be at the Knowledge Graph Conference in New York. We are going to have our guest live over there, is Katariina Kari from Ikea, talking about all things knowledge graphs next week. And with that, Ricardo, again, thank you. Thank you so much. This was a phenomenal conversation. You opened our eyes a lot to everything.

Ricardo Baeza-Yates: Thank you too.

Tim Gasper: Cheers, Ricardo.

Ricardo Baeza-Yates: Cheers everybody.

Speaker 1: This is Catalog & Cocktails. Special thanks to data.world for supporting the show, Karli Burghoff for producing, Jon Loyens and Bryon Jacob for the show music. And thank you to the entire Catalog & Cocktails fan base. Don't forget to subscribe, rate, and review wherever you listen to your podcast.

chat with archie icon