The Data Mesh-terclass

Tim Gasper [00:00:00] It's time once again for Catalog and Cocktails, your honest, no BS, non- salesy conversation about enterprise data management presented by data. world with tasty beverages in hand. I'm Tim Gasper, longtime customer guy, product guy, data guy at data. world, joined by Juan Takeda.

Juan Takeda [00:00:17] Hey Tim, how are you doing? I'm Juan Takeda, the principal scientist at the data. world, and it's always, it's Wednesday, middle of the week, and time to have... We're back talking about data. We're back in our season seven and episode 167. I don't know, lost count.

Tim Gasper [00:00:32] Yes.

Juan Takeda [00:00:32] I'm super excited. And, finally, we've been trying to have Samia on our podcast, met her at DGIQ, finally, in person. I love it when we have guests who suggest next guest-

Tim Gasper [00:00:46] Mm- hmm.

Juan Takeda [00:00:46] ... AndI think Erin Wilkerson suggested you, and we were like, " Guess what? She's already going to be it." So, we have Samia Rahman, who is the director of Enterprise, Data, and Strategy Governance at Seamen, now Pfizer. Samia, how are you doing?

Samia Rahman [00:00:58] I'm good. I just am going to start coming off maternity leave. So, that's a new phase, new season of my life. Someone made a joke that I have a new data product in my life. I don't know if it's data or AI. It does all human- like things. So, that's how I'm doing.

Tim Gasper [00:01:19] Congratulations.

Juan Takeda [00:01:20] Congratulations. Well, our tele toast is, what are you drinking? What are we toasting for? Well, I know what we're toasting for that.

Tim Gasper [00:01:25] Yeah.

Samia Rahman [00:01:25] Yeah, yeah. So, this is for my son. It's herb tea, fennel and fenugreek tea, supposed to be great to help feed him, and et cetera. So, that's what I drink water religiously now and lots of herbal teas for him, so I toast it to him.

Tim Gasper [00:01:46] That's awesome.

Juan Takeda [00:01:47] That's awesome. We're just having coffee.

Tim Gasper [00:01:49] Coffee, water. Yeah, it's too early here.

Juan Takeda [00:01:51] Too early today-

Tim Gasper [00:01:52] It would be great to have a cocktail, but-

Juan Takeda [00:01:52] Well, we'll do that later tonight. But, yeah, so-

Tim Gasper [00:01:57] Cheers to your new family member.

Juan Takeda [00:01:58] Congratulations in your family.

Samia Rahman [00:01:58] Here's to that.

Juan Takeda [00:01:59] So, all right, our warmup question. So, given that the title today is the Data Mesh Masterclass, let's talk about masterclasses. Have you taken any masterclass, or if you haven't, which class would you like to take? Or, if it doesn't exist, which one would you want to take?

Tim Gasper [00:02:16] Samia, have you taken any masterclasses?

Samia Rahman [00:02:19] There's a website or a platform called masterclass. com, I believe.

Tim Gasper [00:02:24] Mm- hmm.

Samia Rahman [00:02:24] I got the subscription a couple of years ago and they have a... Comedy is my thing, or used to be a big hobby of mine. So, I took a class, the masterclass with Steve Martin and that was fun. I didn't obviously get to interact with him, but I learned a lot. And then, there were lots of cooking classes and so on. So, from the personal hobby side, that masterclass was awesome. From the tech world, I don't know if I've taken any master classes. I like to learn through experience and through workshops with people. I don't think those folks ever call it as a masterclass, because there's no mastering of the craft. It's always just being refined, at least in our space, data and tech. So, DDD Europe has some great domain driven design classes and hard... There's a great workshop on the hard bits of architecture. So, I'm hoping I can take that at some point. It's been on my bucket list just to refresh and rethink about architecture in the scalable future that's ahead of us.

Tim Gasper [00:03:35] That's awesome.

Juan Takeda [00:03:35] How about you?

Tim Gasper [00:03:37] So, I also signed up for a masterclass I think two years ago, and I think I accidentally renewed it, because I think it's on an auto- renew, yeah. It was like, " Okay, whatever." It's actually pretty cool classes. So, I don't know who knows this about me, but I actually really like electronic music, and so I took a masterclass on from Armin Van Buuren-

Juan Takeda [00:03:54] Oh, wow.

Tim Gasper [00:03:54] ...Going through how to create electronic music. So, I found that to be very cool.

Juan Takeda [00:03:58] So, I am not on masterclass, but I've been wanting to get it and I've been wanting to do the one with Steve Martin and comedy-

Tim Gasper [00:04:04] Oh.

Juan Takeda [00:04:04] ...Because I love... For me, when it comes to comedy and presentations and giving talks, these two things go very nicely connected, and I am a huge Jerry Seinfeld and Larry David fan. I just love how they tell their stories. So, yeah, I want to do the comedy one, but I want to do some cooking ones, too. So, yeah, I think we're very much aligned on that.

Tim Gasper [00:04:26] Yeah. And this is a free ad for masterclass. com. We have no sponsorship relationship with them.

Samia Rahman [00:04:32] Yep, no sponsorship.

Juan Takeda [00:04:34] Now, okay, going back to the tech one, one of the first pandemic stuff I did, I took Zhamak Dehghani's class at DDD, so that was-

Tim Gasper [00:04:43] Ooh, nice.

Juan Takeda [00:04:44] ... Early on.So, that's how I really, really appreciated having the opportunity to do that.

Tim Gasper [00:04:48] Mm- hmm.

Juan Takeda [00:04:50] All right, well-

Tim Gasper [00:04:50] That's a great segue into data mesh, huh?

Juan Takeda [00:04:52] Yes. All right, well, okay, so, we're going to talk about data mesh, knowledge graphs, AI, all these stuff. But, all right, let's kick it off. Honest, no BS, we've been into a couple of years now with data mesh as a trend. Sanjeev Mohan, in his predictions and stuff, put actually data meshing kind of a declining, " it's going to die" stuff, and you saw a bunch of people saying, " No, it's true" or"No, data products are-" Okay. Honest, no BS, where are we? What have we learned? Where are we going? Is it dead? Okay, you go.

Samia Rahman [00:05:22] So, it's been four years. I actually had the privilege of, one, having the mentorship of Zhamak, and also, working with her when data mesh was first published back in May 2019. So, it's been four years. I was just doing the math today. I was like, " Whoa, it's been four years." To me, the data mesh principles are not going to go away. It has reinforced in the data community that you have to... The principles are also nothing brand new. They've been applied in software and even great AI solutions. Any product has been applying those solutions. So, to me, the principles are not going to go away. It's going to get just reapplied and reinforced, over and over, and we're just going to get better and better at it. Now, you don't need to... I spoke to Ole Oleson, I think he was on your podcast a few months ago. You don't need to sell data mesh, just do it. You will see the ROI with good, trustworthy data, hopefully enabling your AI solutions that help progress your business with innovation, with solving actual problems. It's all about the business value. So, to me, people are doing it without realizing it, is one of my observations over the last four years. And then, people who have intentionally done that are seeing the value, especially in complex spaces. So, biotech is where I've been at in the last three, four years. I also did healthcare. Those spaces are great. NCIH and all these other public spaces are adopting the data mesh principles, because it makes sense for them. Their domain is just way too complex. Modeling a baby's data growth charts and all these things. Very, very difficult. The epic software that's out there, it has the data model and it has data products underneath it that allow for someone to raise a baby. So, to me, it's just happening. We are just defining and reinforcing those things.

Juan Takeda [00:07:35] What I'm taking away from what you just said is like, you know what? Just stop saying the word data measure. Just fricking do it. You're probably already doing it. I think, if you're just spending too much time. So, probably this is a... I mean all the pundits and the talkers everywhere on LinkedIn. Like, you know what? Just shut up. Just go do it.

Samia Rahman [00:07:50] Yep.

Juan Takeda [00:07:53] You mentioned a couple of different industries and different cases in which you've seen data mesh implemented. Are there any success stories that you can talk about? What was that journey to implementing data mesh? Why did they do it, and then, how did they go about it and what was the value provided?

Tim Gasper [00:08:12] And to add to that, I think, what we're seeing, and all everybody talking about the last couple of weeks, and then, the whole linked data world is that, " Oh yeah, it's dying," but no, there's been all these success stories and we just don't talk about them. And it was like, Scott Herman is like, " Oh, there's all this stuff." This is the opportunity. Can you please share?

Samia Rahman [00:08:30] Yeah, so, I will stick to the bio- IT space, but I think there have been great things on the data mesh community from the banking industry. But, I went to the Bio- IT conference last year, and Roche and other companies presented. I think Omar Khwaja, he has talked about how he's applied data mesh with lots of success. And when you see the work that they present at the Bio- IT conference, those people are not talking about data mesh. They're talking about the actual biomarker data, all the complicated clinical trial design data and how they're creating interoperable feedback loops back into research. So, you serve patients and then from that, all the real world evidence that emerges, people are leveraging it in the next drug that they're going to either optimize or develop. So, to me, pharma has seen a lot of value. There are other companies, I'm drawing a blank, but they're applying the principles consistently. It's become a no- brainer of you must do data mesh in pharma because of that complexity. The value chain from the molecule to market is what it is, is research has a lot of unstructured creative data. This is where knowledge graphs and LLMs are super useful. Then you have a little more structured clinical trial data, then you go downstream to manufacturing. Again, lots of structured real- time data emerging, and then, further downstream, there's a lot of compliance. So, data integrity, trustworthiness, and accuracy becomes important. To me, all those things, people don't realize it, but the verticals within a biotech organization have been doing data intensive curation and delivering insights in the way they operate. They have to submit the FDA reports, they have to prove no bias in the way they develop drugs and sell drugs. So, to me, the principles, because of that matrixed and very complex space, it was happening, but now, data mesh is just optimizing that space even further. So, that would be my spiel on, or my top use cases, where I've seen consistent success. And you go to these conferences. If you've done any data work, you'll start seeing, " Oh, they're applying product management. Oh, they have a business value." They have the culture because the domain SME or the domain lead, there's the bioinformatician, they're dealing with data in and out. Let's say the biomarker director or the research bioinformatics director, he's probably, he or she or they, are probably going to be your domain lead who's curating that data. There's genomics data, there's so much data out there from external ad internal that they're having to harmonize and now data mesh, the principles are accelerating them and the way they operate. So, their working groups become the governance committees. They're standardizing their data and actually activating real value out of it.

Juan Takeda [00:11:54] So, going back to my previous comment of like, " Hey, if you're talking about data mesh or just go do it," what you were saying, frankly, the folks who are just talking about data mesh or complaining or going into these details and stuff, is that an indicator that they're really disconnected from how the business and the folks who are not... They're doing it. They're actually just, they're part of the business, meaning, they're doing the biomarkers, they're assigning the regulations. That's, I think... How do you figure out where is the noise from actually people doing stuff, and...

Samia Rahman [00:12:27] Right.

Juan Takeda [00:12:29] We got to call. We got to be honest and no BS and call out where people are too much, blah, blah, blah.

Samia Rahman [00:12:33] Yeah, yeah. So, I think one of my early implementations, we were very disconnected from the business and that, I would say, was a big learning that, and we know this from any digital transformation initiative, as well. If you're disconnected from the business and you just say, " Hey, I'm going to go execute a digital platform," it's going to give you all these value, it's usually a failure because you are so disconnected from the business. That collaboration is super key. The partnership is super key. I don't think... A lot of the problems I've seen as IT trying to drive data mesh execution or any data transformation or digital transformation, they forget to partner with the business. That partnership is key. Without that, and that's where the sociotechnical of data mesh is. People often couch it as a technology. Nope. Anyone who associates with tech, it's like, " No, you have it wrong." You have to work with the domain space, you have to work with the business. And don't even bring up data mesh. Just understand the business problem. You're a product guy, right, Tim? Continuous discovery. And then identify the opportunities. " Hey, if we invest in these data products in the value stream and build these optimization solutions or throw in a chat agent that will allow for a researcher to quickly learn about the historic development of drugs." Those are great things. Those are accelerators. They're going to be time savers for those knowledge workers. So, to me, business is a must have. You cannot do any work without the business.

Tim Gasper [00:14:21] I think that is such an important point. I'm really glad that you're emphasizing it. If I look back at some of the episodes that we've done on the show and some of the different guests and some of the different examples, I'm definitely seeing a bit of a trend, which is that we've got more of the tech first companies and they tend to have their data much more affected by, owned by, and managed by the tech teams. Oftentimes, even the engineering teams, because it's coming from software and things like that, and their vision of data mesh tends to be a little bit more technical, and they're also gravitating towards more technical implementations of data contracts and things like that. Seems to be one kind of body of data mesh adoption. And then, we've got the everybody else bucket, which other industries that are not so tech- centric where it seems like, in that second model, which I think is more companies in general, where data mesh ends up having to rely very heavily on much more the domain experts distributing the responsibility. The role of data mesh there much more is a balanced blend of socioeconomics and sociotechnical. So, that's at least a hypothesis I'll put out there. I'm curious if Samia and Juan, you agree with that or disagree?

Juan Takeda [00:15:35] I think even going back to one of the previous episodes we had with Andrew Jones on data contracts, one of the things that came out of that, it's, contracts will come in from the software developers and you want to shift left and it's level, all exclusive from the tech teams. And then, the issues come up as like, " Well, I'm not actually generating the data myself, I'm bringing it from somebody else." And that's why, " Well, is it really contracts for tests and stuff?" But that's that technical conversation. While you can't have that same conversation with folks who are on the other side. So, I do agree with that description.

Samia Rahman [00:16:07] Yeah.

Juan Takeda [00:16:07] How about you, Samia?

Samia Rahman [00:16:08] Yeah, so I've seen this pattern. I think people who start out a startup that's eventually grown, they're very tech- heavy. Their domain might also be very simple. So, it's already codified in their API. So, the software team has been building that intrinsic knowledge about the domain from day one. Whereas you go into healthcare or pharma, there are 30, 40, 100- year- old companies, the domain knowledge is distributed across the business. And, for me to learn about how research does drug development, I need to go get a PhD, then, or go get a bachelor's again. I did electrical engineering. No way... I don't remember my chemistry at all. So, to me, it depends on the domain. Domain and also at what point the company started. Where does that intrinsic knowledge lie? The other thing. I think it's easy to start from the platform, because we get tech, we know how to do APIs, we know how to do contract testing. We've done it all. So, to me, the technology or the platform, most teams will start there because they understand it well, they can implement it well, but to me, the data product managers or your product managers, those are the key people who really need to partner with the business, build that domain knowledge, and then, start executing it on the platform. To me, technology has been solved. It was there even before data mesh. You could have put up storage solution, compute, what else do I need? A catalog? Great, I have something that can allow me to do things, but intentionality with product management, thinking and business discovery, is so, so key there, right?

Tim Gasper [00:17:56] Yeah.

Samia Rahman [00:17:56] So, to me, agree with you, but also, I would encourage people to really identify where they are in that spectrum of, where does the domain knowledge exist, because your team makeup will be very different.

Tim Gasper [00:18:12] Yeah, that's a really great point, and one thing that you've mentioned a few times now is around data as a product, data products, data product managers. I know that, as data mesh has been around, as some concepts now, I think that of the four different tenets, the one that a lot of people seem to really gravitate towards, it really resonates a lot. I know with me being from a product background, it resonates a lot with me, because I love the analogy of software product development and how that can extend to data in various ways, but also different in certain ways. What is your take on data product management, data product managers, and the role that needs to play and its importance?

Samia Rahman [00:18:56] So, I think, in my reflection, I've been in a use case. There was heavy data stewardship. They had these really long healthcare descriptions of, these are the business glossary terms, this is where the data is, but it was never up- to- date, and it was over 20 years people had, or the data stewards had curated it, and no product management discipline was applied to it. I think that role of a data steward is now being embedded into the role of a data product manager. So, product managers bring that obsessive... I would say they should be obsessive about the business problem with the domain experts, your executive sponsors in a given business domain. They're helping discover, put the roadmap together using techniques like Wardley Mapping and the various, I'm drawing a blank right now, but there's so many great techniques, value stream mapping, et cetera, the user experience mapping, to really identify those data products. So, the data product manager is not only discovering, they're also playing a role of change management there. Them and the team, it's not just them. The change management is happening organically with applying product management, which is, people are starting to converge on, " Hey, this is the value, and yes, we agree that these are the first three data products we should invest in so that we can answer these three business questions or we can answer these three scientific questions." So, to me, that role has become super, super important, and it's not just about putting the glossary and granting access to people. The granting of access should actually lie with the true domain SME, the domain lead who's sitting in the business because they have ultimate legal responsibilities, actually, on who can and who cannot because they would know, " Yes, this is private data or not," but the data product manager is playing the facilitator, the executor, and bringing all the parties together to make that happen. And I would also, another thing I tweaked last year is, it's not just being a data product manager, it's a data and AI product manager. The two have to be thought in... Because you're thinking about the business as a whole, right? Or, when you're doing that discovery, you're not only looking at the opportunity with data, but you're looking at the opportunities of, " Hey, this AI solution could really accelerate this business problem along with these data products." So, yeah, that's my evolved reflection over last year on how key that role is.

Tim Gasper [00:21:51] Yeah, that's a good point on AI products, as well, and I think in general. At least, I see this across our customer base as well as across the market, that folks are tending to say data product managers. That's the go- to phrase, but more and more, folks mean, " Well, data or analytic products." Could be a model, it could be a dashboard or a report. AI products. If I create a custom agent that is focused on answering certain types of questions or analysis from my organization, that also has to be treated like a product. It also has to be focused on business value. It also has to fall against the other priorities. And, a lot of times, maybe, it's similar teams and resources that are having to focus on data and analytics and AI.

Juan Takeda [00:22:33] Something, I'm just, what's going through my head right now. Let's stop the data. It's just product management. So, think about it. We bring in... I think the reason why we call it data product management or dat product manager is because, " Oh, well, there's product management is for software," and if you are a tech company, software company, you want to distinguish that. But a lot of companies are like, " No, we're not developing software technology. We're a pharma company, we're a financial company." And what you want to go do is that there's data, there's a bunch of stuff, not just data and we're now doing AI, we're doing all these. Dashboards and analytics, all this type of stuff, all that stuff that people go use to make their decisions and go run the business. You just need to have a product mindset around that, and that's how...

Samia Rahman [00:23:25] Right.

Juan Takeda [00:23:26] Why do we have thousands and thousands of dashboards? Those shouldn't exist.

Tim Gasper [00:23:30] Yeah.

Juan Takeda [00:23:30] Yeah, well guess why you have thousands of thousands of dashboards. Because you don't have a product management mindset around that stuff, and someone needs to take some ownership accountability around that. So, I think, you've convinced me, starting today, is my new thing is it's not just data product manager, it's just product management in general.

Samia Rahman [00:23:47] Agreed. To me, it should be a domain product manager. So, in a given-

Juan Takeda [00:23:54] Okay.

Samia Rahman [00:23:54] ...If you're splitting up your domain driven design, then, I have a domain product manager who looks at the portfolio of applications, hardware, external partnerships, all of that to make and solve a business problem. It's there, I don't want to reinvent new roles or anything, but that role is split into multiple people. That business department or vertical is being led by someone. They are a product manager. They are applying, " Hey, these are the value, these are the profit and loss things we need to look at in this vertical, and this is what I need to share with another part of the organization to launch that next drug or launch that next consumer good." So, to me, product management is a mindset. Fully agree with you, Juan. Everyone should be applying it. It should be the bread and butter of everyone's role.

Juan Takeda [00:24:51] So, this is a good segue into jobs and roles and AI today. That's something you wanted to chat about. How are you seeing the evolution of jobs, one from product management. Product management is one of those, but AI is coming to the mix here, and how is that changing things all the way from just traditional, the data governance work, data stewards, and now we have data product managers, and now we're bringing in these LMSs and we're doing prompt engineering. How are you seeing that, and how is this... What's going through your brain?

Samia Rahman [00:25:22] Yeah, so, there are two things, and they're both interrelated. One is to be more of an effective, let's say, data strategist or data product manager or the AI product manager. I can use generative AI or your ChatGPT or GPT solutions to help me even define business problems, do business discovery, put together my data dictionaries quickly. I was able to ask ChatGPT, " Hey, give me a data model for a healthcare professional." Spat out all the time, and it was pretty correct. I had the knowledge. I read through it, I checked it, and I was like, " Okay, I approve it, and I can use this. I can publish this as part of my data product." So, I think there's that one optimization loop of helping yourself with AI to do your work better and faster. The other aspect is the domain leads that I talked about, the bioinformaticians or the domain SMEs, they are going to as part of their responsibility, become natural prompt engineers because they're always asking the right questions in their specific domain. So, from an AI perspective, everyone's role will have this, again, have the product mindset but also have, " How can I leverage AI mindset in the work that I do or in the research that I do?" I think that's going to be, or, I already see it, that teams are starting to do that, right, and that's going to start driving value.

Juan Takeda [00:27:06] This last point is something... I'm having a little aha moment here. You said the SMEs will be the natural prompt engineers, because they're the ones asking the right questions, and I'm already imagining a world where we have the titles and the roles of the prompt engineers and you get trained and you took all these stuff, but then, you're disconnected from the business, you're disconnected from the domain, and then, you go back and fall into all the technical stuff and then we go back to the same stuff. So, we're wasting our time and you're doing all these things and then we're going to come up with other tools that will help us go manage our prompts. You're like, " You should let the domain, even train them and let them be the access." So, this is an important, important call- out, right there.

Samia Rahman [00:27:47] Yep. And I think it should be treated as a discipline, because then it's something people can learn and apply, because I can learn how to do prompt engineering, or if there's a committee or a Center of Excellence, is what some people call it, or a community within an organization. People come up with templates of, " Hey, this is how you can do X, y, Z activities with these prompts." So, the domain SMEs can go and learn from that community or even develop their own templates from it. So, to me, it's a discipline that starts getting embedded and that, to me, is again that shift left mindset. At the point of creation or innovation, that's where you want to embed data governance, you want to embed security, you want to embed privacy thinking, you want to embed the AI thinking there, as well.

Juan Takeda [00:28:46] So, I want to plug in here. I'm trying to find a nice 50 way of connecting it, but don't know, so, I'm just going to throw it out. Okay, where do knowledge graphs fit into all of this that we've been talking about?

Samia Rahman [00:28:55] Yeah, so, I'm trying to remember when this knowledge graph conference happened. Zhamak actually presented at it. It's one of the big knowledge graph conferences out there. Paco Nathan, I believe, hosts it, if I'm not mistaken.

Juan Takeda [00:29:11] Yeah, the one in New York City.

Samia Rahman [00:29:14] Yes.

Juan Takeda [00:29:16] She was there and I actually hosted the panel, which is a podcast episode on Catalan Cocktail, so you should-.

Samia Rahman [00:29:23] Oh, okay, awesome. So, to me, and her message was that, when you implement data products with... There are key characteristics that are called about for data products. One, it has to be understandable, it has to be trustworthy and interoperable. So, linkability is the knowledge graphs look to create links between different data sources. So, to me, when you have valuable, well- defined, or well curated data products, then you're starting to build your knowledge graph as the next layer, and it could be, so, a data product becomes a node, and it itself can represent its information and serve it out on a output port in the format of your knowledge graph database, whatever your output. Could be a data warehouse. I know you guys published that paper on like, " Hey, I can use LLMs to generate text to SQL to query actual structured databases. So, to me, data products can now start serving their data in the right formats so that you can leverage your large language models to query it. There's another aspect where I see data products also allow the development of your domain specific language models. So, we talked about these domain SMEs. Because they are curating and building, not implementing the engineering bits, but putting definition and putting the business value on these data products. The domain specific language model is also starting to emerge as you build it. You're starting to capture, " What does molecule mean? What does biomarker mean? What is the role of a bioinformatician? What is genomics," right? All those kind of vocabulary, and the linkage is now being captured as part of those data products and registered, hopefully, in your data graph and your larger knowledge graph. So, you're creating tenants in these various spaces for your data products so that you can start interoperating. And that's where, now with your domain specific LLMs, you can register a chat or develop a chat agent that now allows you to query at one data product or multiple data products across that knowledge graph. I know I'm doing a terrible articulation. It's very well- defined in my head, but to me, it's all layered. Every data product has a layer of ontology with it. There's an LLM, there's an extension of a chat agent. I'm a researcher, go to a catalog, I find the data product, there's a chat agent, I can ask it questions. Those are some of the things I would imagine.

Tim Gasper [00:32:23] This is cool. So, I hear you pushing two motivations here. Two things here. One of them is that the data products, and you mentioned about data, product management and things like that, and it being a driver of where there's going to be value and focus and attention. And then, also, it ends up being the right lens by which to develop the domain specific language, because in order to have an effective data product, it has to be understandable and discoverable and useful and all these different things, which requires you to have that semantics around it, and the AI is a whole nother piece on top of this. Can you talk a little bit more about, when you say domain specific language, I think you started to talk about some of these different pieces, and in my mind I'm thinking, " Oh, it sounds like glossary, ontologies, taxonomies, relationships, all those are part of the DSL," but can you talk a little bit more about what you mean by a domain specific language and why it's important?

Samia Rahman [00:33:18] Yeah. So, it's all of those things you mentioned, Tim, right? But, to me... So, the large language model that's out there, it's on all the texts that's out there, it's not very specific on, let's say, the healthcare domain or the biotech domain. To me it's training and developing your own LLMs in the context of your domain with all the data sources and the relationships you've curated. So, again, it's a duality in the sense that you can use large language models to help you build your domain specific language model, and that's where I think the RAGs, or the retrieval augmented generation, is super useful, because now you have your general LLMs and then you have your domain specific LLMs that, combined together, allow you to have those very domain specific GPTs that accelerate the user experience for a researcher and so on. So, the architecture pattern is very consistent, I think, right? You just have to put in those data products, you have to put in your domain specific language model that's built based on those data products, and it's interconnected because it feeds back into your system to help answer your chatbot questions and so on.

Juan Takeda [00:34:44] So, a couple things. One is, from the way I see this, I'm curious if we agree or not, is, you have your general LLMs, and then, you want to be able to go take your internal context of your organization. So, you basically will build out the knowledge graph, and there's different layers of building this out through data products and so forth.

Samia Rahman [00:35:02] Yep.

Juan Takeda [00:35:02] But that is what's going to be... Once you create that which is accurate and it can be explainable, all these things, that's what you're going to go use to go train or create that domain specific LLM. So, you're shaking your head, we're in agreement there.

Samia Rahman [00:35:16] Yes, and it doesn't have to be sequential. I think it's an ecosystem. It's a holistic system that you need to think about as a whole. So, to me, the unit of architecture has now expanded for a given... In data mesh, we talk about a data product as a unit of architecture, but I think now, it's being expanded to, " Hey, this domain unit of architecture, where the knowledge is captured, you need to have this ecosystem of data products and to build the training of LLMs." I don't think we should... There's always a sequential nature to development, but I would encourage to think of getting your first MVP out there with the holistic in mind, because then you get the feedback loop.

Juan Takeda [00:36:03] Yeah, you want to pay as you do it.

Samia Rahman [00:36:04] Yeah, you're building everything in thin slices across the stack, then.

Tim Gasper [00:36:09] Yeah, you talk about iron threads all a lot, and I always say I think about that. A holistic approach.

Juan Takeda [00:36:15] So, going back to, you were talking about a layered approach, and so, let me repeat it in my words and how I've been thinking about things, and I'm just curious to hear what you think. I separate the graphs when it comes to... Two perspective. I call it the metadata technical graph, and then, a domain graph, and metadata technical graph would be one kind of... What you said before. You have the data products, they need to be linked. The data product itself is a node, so this is more of, at a high level metadata... The metadata, right? So, here's this thing called the data product. So, much stuff in here, but I'm just, at a high level, describe it. This data plug is related to this one and so forth. That's that technical metadata graph, and I would go first, and you want to be able to go search on it, find it, share it, all those things. And then, so that goes all the way up to the chat bots that you want to have on top of that technical metadata graph. And then, the next layer deeper is you go now into the domain knowledge graph, where my node isn't the data product, my node is actually the patient, the node is the biomarker, the node is the very specific things that I'm... So, the ontology here around it is more about your particular domain. Well, in the previous example, the ontology is more about the metadata, the things that we described. That's how I've always been separating things. I'm curious to know, does this resonate with you?

Samia Rahman [00:37:43] Yeah, I think of it that way as well, and there's a link, a data product can register itself and link itself to the right ontology. So, it's also not just technical metadata. A data product has to also describe its business metadata. So, to me, it is a part. So, the layers itself are connected to each other, and it's not square boxes that are connected. It's like multiple data products belong to this ontology and definitions of terminology, the domain terminology, and you can register or link the right data products to it. So, to me, yes, to what you said, but it's also that linkability between the layers. I don't know if that makes sense.

Juan Takeda [00:38:36] Going back to... No, I think we're very much aligned. I think, what we need next, get on a whiteboard and go draw this out. I think we're very much aligned on this. I strongly believe we are. Going back a little bit to the jobs and roles. We're talking now about ontologies and stuff. That's my, where I come from, and not even a long time ago, we wouldn't say the word. " Oh, don't say the O word, ontology. That's a bad thing." And I think, now, we're starting to hear it more and more. Where do ontologists and taxonomists come into play right now with all these different types of jobs and roles?

Samia Rahman [00:39:10] That's a-

Juan Takeda [00:39:11] Or do they, well, is it this one?

Samia Rahman [00:39:12] Yeah.

Juan Takeda [00:39:12] Or is... I don't know.

Samia Rahman [00:39:14] Yeah, so, to me, ontology is again like a discipline or a practice. So, ontologists can become facilitators that help your domain SMEs capture that knowledge into the ontology, but will they have the ability to go generate this ontology without the partnership of those domain SMEs? It's a recipe for failure, in my opinion, because a generalist will never succeed without that partnership with the true domain experts. So, to me, it's more about enabling folks to build the ontology incrementally in their specific spaces. And then, again, thinking of that unit of architectures, I think our unit of architecture is slightly expanding now where, " Hey, I need to think about not only publishing, the simplest thing I can do is a data dictionary, but I also need to think about how does it link to the bigger domain ontology that we've been curating as a data community, the domain community, and linking it as part of the practice of building anything that they build."

Tim Gasper [00:40:26] That's interesting. So, it sounds like you're saying that... Earlier you mentioned user journey mapping, right? Ontology development and taxonomy development, you're thinking of it less of as a role and more of a, it's a discipline, it's a practice, it's a tool that we're going to leverage that is important

Samia Rahman [00:40:45] And it's built into your product lifecycle development process. So, before you go to production, you can enforce or establish a standard gate for release of your product. " Hey, must register to" or" Must be published on the ontology map so that it can be used for X, Y, Z purposes." So, you can start introducing those into your product development lifecycle.

Juan Takeda [00:41:12] So, what would you tell... So I'm obviously biased here because knowledge graphs and semantics ontologists, this is like my life's dedication. So, for you, what would you tell people, who are the types of organizations and people who should be starting to focus on knowledge graphs? Is it for everybody? We talk about data mesh, right? " Oh, data mesh is not for everybody."

Samia Rahman [00:41:36] Yeah.

Juan Takeda [00:41:37] Where do.. How do... Knowledge graphs, who should be paying attention to it? Who should be doing it, who's missing out if they're not doing it, and so forth?

Samia Rahman [00:41:44] Yeah, that's a very good question. I haven't put a lot of thought to it, but I think, in general with the generative AI space, especially with text, it's for all, in my opinion. A company that does marketing for consumer goods, they can benefit from it, and they will need... If they invest in a trustworthy knowledge graph, then that will give them more accurate marketing outputs and more tailored things. And they already do that. A lot of these companies invest into customer master data management platforms or curating that customer data. So, to me, any company, especially larger companies... If I'm a small company, I don't think I would invest in a knowledge graph, but I definitely would use the generic LLM to get my creative content out there and so on. But, yeah, I think most enterprises will benefit from even a small scale of, again, capturing that knowledge and capturing it in the form of graphs so that they can have more trustworthy decisions and content being generated.

Juan Takeda [00:43:05] Well-

Samia Rahman [00:43:07] I'm curious on your thoughts, Juan.

Juan Takeda [00:43:08] Well, first, I'm biased. Well, I would say that, agree with that. If you're smaller, you have very focused when you go do, and I think, one, I just remind people, you bring in knowledge graphs when you want to really integrate distributed data. If you're just focused on this one application... My argument for knowledge graphs. If you have a known problem, a known use case, and you're just focused on the what's immediately, don't do that. Just build your application, be done with it. But if you're have to focus on the known use cases of today and unknown use cases of tomorrow, and you're interested in how to figure out the scale, you want to go invest in this, otherwise, you're just going to... That's how we end up in silos. So, that's my definition.

Tim Gasper [00:43:49] Yeah, no, and I think the only thoughts I'll pepper on top is, I really think that, because of what's happening with AI, we're going to see interest in knowledge graphs become even greater than ever, especially because of the importance of what you mentioned, of it being an important way to represent and to incorporate some of the domain specific language and other types of things that we know are going to be necessary to make AI useful for our organizations.

Samia Rahman [00:44:18] Yep.

Tim Gasper [00:44:20] And my second and last thought is that, it seems like, over the years, and I'm curious, Juan, if you would nod to this or shake your head to this, that knowledge graphs have been a relatively complicated endeavor, and therefore, it's tended to lean towards the larger companies that have organizations that want to create labs teams and stuff like that, want to work on it, et cetera, or specific use cases that are tailored to it. But, as knowledge graphs are becoming easier and easier to manage and create, and maybe become embedded in things, then it's becoming a lot easier for it to be more widely adopted.

Juan Takeda [00:44:51] I also think, with LLMs, they're going to help create it, and especially in this whole... The product management, the knowledge acquisition, we'll be able to create these chat bots that will help us acquire that knowledge.

Samia Rahman [00:45:03] Exactly.

Juan Takeda [00:45:03] And then, people's like, " Well, there's different ways of saying this. Well, guess what? I want each of you, get on your little chat and go chat with-

Tim Gasper [00:45:10] Tell me all the things.

Juan Takeda [00:45:11] ... Threeto four minutes so we can all get the different versions of this and we'll start. So, I think that's how we're going to go start.

Samia Rahman [00:45:17] Agreed.

Juan Takeda [00:45:17] Wow, so much and time flies. And now, what I really want to go do next, which we can't do. We'll have to figure out, is get on a whiteboard and start drawing all these stuff. So, we probably need to go do... The next evolution of the podcast is like a live show with whiteboards, and-

Tim Gasper [00:45:30] Maybe a masterclass in front of the whiteboard.

Samia Rahman [00:45:35] Yes.

Juan Takeda [00:45:35] We're in that balance of no BS and start BSing because we're figuring shit out as we go.

Tim Gasper [00:45:40] Hey, don't take away my time.

Samia Rahman [00:45:42] This needs masterclass presentation style, not a masterclass per se.

Tim Gasper [00:45:47] Lowercase m. Anyway.

Juan Takeda [00:45:48] All right, let's hit our lightning round. So, I'll kick it off, though. First question, AI is taking the world by storm. Do you think, in a couple of years from now, we'll be talking about... By the way, I did not write this question. Tim wrote this question. So, about AI mesh instead of data mesh?

Samia Rahman [00:46:06] Uh-

Juan Takeda [00:46:06] Whatever that's supposed to mean?

Samia Rahman [00:46:11] Maybe, sure. I don't believe in buzzwords. What's the value in it?

Juan Takeda [00:46:17] So, the is yes. The answer is yes, but it will become a buzzword and we don't know what it is.

Samia Rahman [00:46:23] Yeah, yeah, sure.

Tim Gasper [00:46:24] Who knows. All right, so, for all of those predicting the AI mesh, we look forward to hearing your thoughts. All right. Second question. As a governance leader, which you are, if you had to choose only implementing domain product managers or domain stewards, you could only choose one, which one would you choose?

Samia Rahman [00:46:48] Product managers.

Tim Gasper [00:46:50] Yeah?

Samia Rahman [00:46:51] They can bring a lot of value with that product discipline, and the stewardship can be distributed across the domain SMEs, the data engineer, the data analyst who's using the data, and the data product manager's helping capture those things within the releases of those data products. So, yeah, I would always go with product managers, because they are ultimately making sure that the product is usable, and it becomes part of your responsibility. It's something I would at least hold them accountable for. Where's the value and where's where is the data product glossary and the ontology associated with it? You can start holding-

Tim Gasper [00:47:36] You can ignore all that.

Samia Rahman [00:47:37] Yeah.

Tim Gasper [00:47:38] That work that needs to happen, right? No, that makes sense. I love that you're so forward- thinking and pursuing around data product managers, because I think there are a lot of people out there who would answer that question. " Well, of course I got to have data stewards, I got to have data stewards, and then data product managers on top. That's a useful additional thing." But I think you're very, like, " Hey, if we do data product management well, then why are we focusing so much on the traditional stewardship approach?"

Juan Takeda [00:48:03] Yeah, the stewardship is part of the product management. That's what-

Samia Rahman [00:48:05] It's embedded.

Juan Takeda [00:48:06] It's embedded in now. All right, next question. You are leading a data strategy and governance. Is this the home for AI strategy, as well?

Samia Rahman [00:48:16] That's a great question. I think it's... There are two parts to it. It's too much for at least one team to take on responsibility. So, there needs to be a cross- functional team that brings data and AI together, because when you think about, I've had to think about privacy a lot when it comes to patient data or healthcare provider data. To me, there's a partnership with the legal team. There is a partnership with the AI specialists who are going to say, " Hey, is this ethical?" I myself and anyone who I am partnering with to make any solution happen, we have to think as a collective around, what are the implications of that AI solution and what are the compliance implications or compliance things we need to account for? And we have to think about the whole. So, again, I would go back to that unit of architecture is now expanding. It's not just about the data, it's the data, the AI solution as a whole, just like we do with any software application that's going out there. We think about all those things together. Now, depending on your complexity and the business, your team size, et cetera, you can have different team topologies, but they should always be working in partnership.

Tim Gasper [00:49:37] Yeah, that makes sense. Yeah. All right, last lightning round question is, is it a good thing that the hype around data mesh is fading?

Samia Rahman [00:49:47] The hype? I guess so. I am assuming people are adopting it, applying it, and it's like I said at the beginning, it should be, just do it, right? Coding Nike over here, but just do it. I think the hype around it is fading because people are already doing it or it's just become inherent part of like, " Hey, when we go apply or implement this digital transformation for this organization, we are going to apply the principles of data mesh to drive out things at scale."

Tim Gasper [00:50:28] No, I love that. So, I think that, maybe, the response from you to some of the naysayers around data mesh would be like, " We've gone very quickly from,'This is new and interesting,' to, 'Wait a second, no, this is obvious. This is the way that we should be doing things.' So, it's become,'Yeah, duh' very quickly, and hopefully, folks are implementing it."

Samia Rahman [00:50:47] Yeah. Yep.

Juan Takeda [00:50:49] Wow. All right. Takeaway time. Tim, kick us off.

Tim Gasper [00:50:53] Let's do it. This was an amazing session. So many great takeaways today. So, we started off with some honest no BS around, " Okay, data mesh has been here as a trend, a hot trend, for a few years now. What have we learned through this whole experience here?" And you had said that you got involved in this movement a long time ago, four plus years ago, collaborating with Zhamak and others, and to you, the principles are not going away, the four key principles around data mesh. Data products, federated, computational, data governance, et cetera. And they have been applied in many different places across data and AI. By the way, I think that's a theme of what we talked about today, the applicability of these practices to data analytics and AI, and it's just getting applied and reinforced in various industries over and over, and you don't need to sell data mesh, right? I think there was a lot of preoccupation in the market about like, " Oh, how do you convince your CDO to buy into data mesh or your CFO or whoever you got to get to buy into?" Like no, you're a data leader. You are working on a data team. Take the best practices, implement it, just do it, and you will see the ROI because of the trustworthy data and the fact that you're generating value from these different applications, data and AI applications to solve business value. People should be doing it, and don't worry as much about the selling, that the value will do the selling. We asked about success stories and you mentioned banking and many other industries, but specifically, you zoomed into the bio informatics and pharmaceutical application of this. You mentioned Roche and Omar Khwaja, who also gave a talk on Catalog or joined us as a guest on Catalog and Cocktails in the past, gave a talk around the biomarker data, clinical trial data, creating these interoperable feedback loops, is what you said there, and going from a molecule to market. Really focusing on the value, data compliance, trustworthiness, accuracy. So, that's a great example of how, in a complex domain with rich data, with a variety of different experts, could really come to value, and you mentioned that these industry data experts, one of the great things about data mesh is the industry data experts are being able to be leveraged as the domain representatives, and data mesh is accelerating the way that these experts can operate on the data and work with the data. Standardization, activating value from data. So, really, really critical there. Finally, before I pass it to Juan, we talked about, if you're disconnected from the business when you're implementing with data mesh or working data mesh, you're doing it wrong. It's going to be a failure. That's why we talk about it as a sociotechnical phenomenon. There has to be continuous discovery, which is a ding, ding, ding phrase for me because I always think about product management, continuous discovery, continuous delivery. So, the tie in there is really important. It's easy to start with platform and technology because that's the thing that we gravitate towards as technologists, but we have to remember that, if you're going to implement something like a data product manager, that's a tee up for Juan. They really have to be focused on the value and focused on the impact it's going to have. So, Juan, over to you for your takeaways.

Juan Takeda [00:54:09] Here we go. So, we talk about the data product management, data product managers, right? Product managers are obsessed with understanding the value and that user experience, the value chains, the user journey mappings, and this is where change management is coming in. When you apply product management, people converge on, what is the value and what are the priorities? By the way, there's so many T- shirts that are coming out of this episode.

Tim Gasper [00:54:29] In our notes, we bold everything that's like, "Oh, that was a magic phrase."

Juan Takeda [00:54:32] Yeah, so that's one of them.

Tim Gasper [00:54:33] Samia, you got a lot of T- shirts that are coming out.

Juan Takeda [00:54:37] Here's another one. It's not just data product management, it's data and AI product management. And then, we had this riff on like, "Wait, is this really just product management?" But I think, also, it could be domain product management. But, at the end of the day, it's just bringing this discipline of product management. That's the most important thing. We talk about how we're seeing the evolution of jobs and AI come into the mix here. So, really, AI is about helping yourself to do your work faster and better, and I really love this discussion we had about SMEs. They will be the natural prompt engineers because they're already asking those right questions, and that's why it's really important to treat promptiness as a discipline. Everybody should be learning about this. There'll be templates, we should learn from a community. Get this in there. And then, we start wrapping up with knowledge graphs, and I think, we talked about if connecting with data products, the data products need to be linked, so the data product themselves are a node, but then, data products, when you're doing the whole process of creating them, you are capturing that knowledge of semantics around that stuff. And then, you're going to have these combinations of the foundational AI models that you want to be able to train them with your specific knowledge and you want to be able to go use the data products and the knowledge graphs around that, and it's a holistic view between all these things. We actually talked about this differentiation between, there's the technical metadata knowledge graph, where your data product is a node there, versus your domain knowledge graph, but they're all connected, at the end of the day. Talking about disciplines in practice. Ontologists, this is a discipline in practice, and another great quote. A generalist will never succeed without the partnership. I love that one. And then, we talk about, " Hey, data mesh is not for everybody." So, knowledge graphs, who is this for? Who should be paying attention, and your argument's like, " Hey, everybody should be paying attention to this. Marketing, consumer goods can benefit if you want to invest in a trustworthy knowledge graph that will result in better marketed products." It does lean towards large organizations, and as we were riffing there, if you are only focused on your efficient, known use case of today, that's it, probably not your focus. But if you want to deal with your known use case of today and the unknown uses of tomorrow, that's where you need to fall in. How did we do? What did we miss?

Samia Rahman [00:56:44] Great. I love how you guys do this playback and it's a much better articulated summarization of what I tried to communicate. So, thank you, you guys.

Juan Takeda [00:56:54] This is just, we're just repeating what you said.

Tim Gasper [00:56:56] Reflecting you.

Juan Takeda [00:56:57] Thank you so much for all the valuable knowledge that you shared with everybody here. To wrap up, three final questions. What's your advice, who should we invite next, and what resources do you follow?

Samia Rahman [00:57:09] My advice, based on reflections over the last couple of years, is drive value and innovation with collaboration and eliminate ego. One of the big things I see with product management, Tim, we can maybe chat offline. I've seen a lot of egos get in the way of who owns what application, what team owns what, whose portfolio and whose resumes-

Tim Gasper [00:57:34] Never.

Samia Rahman [00:57:34] ...Making headway. I think it's very important, with data mesh, a reflection of mine is, be comfortable with evolving collective ownership of the products you build. One day, those data assets might belong in research, but you have new customers. Now, they also have a share, invested share, into that data. So, it's very important to put aside ego and think about that collaboration. So, that's my... Hopefully, I can summarize that one day. I might use ChatGPT to do that, but that would be my key.

Juan Takeda [00:58:10] Very well said, very well said.

Tim Gasper [00:58:11] Yes. Could be a manifesto, all on its own. That's awesome.

Juan Takeda [00:58:13] So, who should we invite next?

Samia Rahman [00:58:16] Okay, so, I have two nominations, I guess. Gary Kretchmar, he is a colleague of mine. He actually built the self- service internal data platform to empower data product teams to deliver their data and AI solutions faster. I think the community has a lot to learn from him in the way to execute with empathy, not just the platform team. Platform teams struggle a lot with a lot of pressure. He's done a phenomenal job, and also, the change management and empathy for the customer. So, I think he's a great person, and I'll just code him a little... He put a great, fantastic definition on data mesh. He said, this was to a c- suite person that, " Hey, think of data mesh as a data lake. It's a lake with a bunch of boats on it. Nobody knows each other. They're all in silos. With data mesh and the tenancy model, it gives each boat a visible ID so we can all talk to each other, lowering the silos and bringing us all together." That needs to go somewhere on some manifesto or somewhere. I would love to see him talk more about his perspective on everything. And the other person I've been following is Tony Seale from UBS and knowledge management. I'd love to hear.

Juan Takeda [00:59:35] He's been on the show.

Samia Rahman [00:59:37] Sorry?

Juan Takeda [00:59:38] He's already been a guest.

Samia Rahman [00:59:39] Oh, he's already been a guest. I'll go check out the podcast then. But yeah, I've been following inaudible for a while.

Juan Takeda [00:59:45] Yeah.

Tim Gasper [00:59:45] He is amazing.

Juan Takeda [00:59:46] Yeah. Awesome, following all the stuff that Tony does with Knowledge Rep and talking, and we'll reach out to Gary. I appreciate if you give us an intro to him. And finally, what resources do you follow?

Samia Rahman [00:59:58] Product management content from inaudible, Akash Gupta, and Powell Herrin, I'm probably butchering his last name, but I found their content to be phenomenal, and I've shared it with anyone in my community who's aspiring to be a data product manager, AI product manager, or are already executing and are new to the discipline, but have a lot of domain knowledge. So, I think they bring a lot of that best practices that anyone in the business can apply that product mindset. And then, people that I'm following, Bruno Aziza, he's at Alphabet. inaudible Rushdi on AI strategy. She's been posting a few interesting things and I think she's going to publish a book soon on AI strategies, so I'm going to be looking forward to what she has to say. And then, Team Topologies, again, super key with even more complexity coming with AI and product. Matthew Skelton and Manuel Pais. Their work is just super, super key. And, of course, I am always in touch with Zhamak. I know she's busy with her startup right now, but always looking out for numerous talks that she does and so on.

Tim Gasper [01:01:16] Great suggestions. I love Team Topologies, by the way. I'm a bit of an org chart nerd, like org structure and things like that. It's a great book to talk about agile approaches.

Samia Rahman [01:01:25] Yeah. Yeah.

Juan Takeda [01:01:26] Well, thank you. Thank you. Thank you so much. It was a phenomenal episode. So much knowledge in here. Just a reminder, next week, we have Santona Tuli from Upsolver. We'll be talking about data and productivity. Really excited to finally have her on the show. With that, Samia, have a great rest of your day, rest of week, and thank you so much for finally being on the show.

Samia Rahman [01:01:44] Thank you for-

Tim Gasper [01:01:44] Cheers.

Samia Rahman [01:01:44] ...Having me.

Juan Takeda [01:01:45] Cheers.

Samia Rahman [01:01:45] Cheers, you guys.

Catalog

Explorer

Marketplace

Governance

Workbench

Catalog

Explorer

Marketplace

Governance

Workbench

Financial Services

Healthcare

Higher Education

Insurance

Federal

State and Local Government

Financial Services

Healthcare

Higher Education

Insurance

Federal

State and Local Government

Data Leaders

Data Engineers

Data Governance Professionals

Analysts & Business Users

Data Leaders

Data Engineers

Data Governance Professionals

Analysts & Business Users

Integrations

API Documentation

Reference Implementations

Support

Integrations

API Documentation

Reference Implementations

Support

Snowflake

Oracle Database

Postgres SQL

Databricks

dremio

Snowflake

Oracle Database

Postgres SQL

Databricks

dremio

Blog

Events

Podcasts

Webinars

Reports and Tools

Blog

Events

Podcasts

Webinars

Reports and Tools

Who We Are

Our Team

Our Partners

Why data.world

Who We Are

Our Team

Our Partners

Why data.world

Press & Media

Events

Careers

Legal

Contact us

Press & Media

Events

Careers

Legal

Contact us

Catalog

Explorer

Marketplace

Governance