About this episode
Mesh is everywhere. It’s in our clothes, our fishing nets, our Wifi networks, and now our data architecture. But what is a data mesh? Do you need one? And if so, how do you start?
Zhamak Dehghani is the Director of Emerging Technologies at Thoughtworks and the leading expert on data mesh. We’ll chat about the emergence of the data mesh as a concept, why the approach works for eliminating architectural silos, and how it’s producing more data-driven cultures.
Director of Emerging Technologies, ThoughtWorks
This episode features
- Key tools, technologies, and skills for adopting a data mesh
- What to do if you’re data mesh curious
- Inspiring architectural designs outside the data space
- Data mesh is not a “thing.” It’s an approach based on decentralizing
- Don’t over think it. Do what makes sense for your organization
- Compute + Policy + Data = one autonomous unit
Hello, everyone. Welcome to catalog and cocktails. This is a weekly live hangout an honest no BS, non salesy conversation about enterprise data
Unknown Speaker 0:24 management, with tasty beverages in hand. I’m Tim Gasper, Director of Product at data world and longtime data nerd and joined by Juan. Hey Tim on wants to Cato. I’m the principal scientist at data world. And as always, it is middle the week, middle of the day or end of the day and greatest pause and chat about data. And today we have I think, the name that I hear almost every other day of my life right now. shrimati got he is the director of emerging technologies for thoughtworks and the founder of the data mesh concept. If you have not heard about data mesh and or not heard about Schumacher, her name, her name, or data mesh has not come up. You have literally been living under a rock. So I’m super excited to be able to spend time here talking with shamrock. And you Tim. So Chuck, nice to see you. Thank you so much for joining us here today. Welcome.
Unknown Speaker 1:14 Thank you. Good to see you too. I have to disappoint. I don’t drink and it’s 2pm in San Francisco. So I’m drinking and mushroom coffee.
Unknown Speaker 1:23 No worries.
Unknown Speaker 1:25 No worries, no worries, well, quick reminders. Hey, please give us your review on Apple podcasts and follow us on Spotify. And also to let you know that we are partnering with the Knowledge Graph conference which is taking place may 3 to the sixth. It’s an and we’re gonna have a special edition of catalog and cocktails where I will be moderating the data architecture panel with Zima, who’s going to be on the panel together with Theresa tongue from Accenture, j. u from Intuit and Mohammed ossur. From McKinsey. You can go to Knowledge Graph dot tech and with the special 10% discount code is cc KGC. So with that, let’s uh are telling toast What are we drinking? And what are we toasting? For shamrock We know you’re drinking mushroom coffee. What is that I have never had mushroom coffee.
Unknown Speaker 2:09 It’s some potion. I drink the four sigmatic guys, it’s just some some Northern European potion. Mushrooms that give you superpowers. This particular one supposed to make me smarter. So you can be the judge of that by the end.
Unknown Speaker 2:26 I want some tea. That makes me smarter. That sounds good. about you, Tim. Um, I am drinking a whiskey Smash. I got way too much mint growing in the backyard. So it’s basically like lemon and whiskey and simple syrup. And I got some, some backyard men in here all smashed together.
Unknown Speaker 2:44 well. I’m having a nice Margarita. And I do want to go toast a special toast today. We have been I think this is Episode 44. And we have so many people behind the scenes that catalog David that world who helps us to go produce catalog and cocktails. And one of those is a really good friend and colleague, Sean sweco, who is going to be going off to his next adventure. But we would not be able to pull everything that we do with catalog and cocktails without him. So this is a toast to Sean. Sean, thank you so much for everything. I know you’re listening. So thank you so much.
Unknown Speaker 3:19 Cheers, Shawn, you’re destined for great things. Thanks so much for all you’ve done for us really appreciate it.
Unknown Speaker 3:24 And we also have our warm up question. So it’s a fun question here inspiring architectural designs outside the data space. So let’s not talk about data architectures talk about some other architectures. So like real architecture,
Unknown Speaker 3:37 I’m going to go out outside of event technology. I would refer to the work of a female architect, building architect, Zaha Hadid, she’s, I think she had mirrorless in background I think Lebanese background if I’m not mistaken, but she had this beautiful organically influence, kind of commercial spaces and living spaces architecture. And so actually, she’s no longer with us. She has very, very young so Her work has been inspiring. For me, it’s
Unknown Speaker 4:11 awesome. I actually I’m not familiar with the work I’ve got to check that out about you, Tim, you know, I don’t know a ton about architecture, but I’ll tell you about architectural designs that I like lately. You know, I really like when you take traditional architecture and you blend it with like new and modern architecture and lately I’ve been looking at some house designs where like the core houses there right and you’ve got like mother’s a Tudor style house or colonial or things like that. But then you actually have parts of the house which are which are modern you know and maybe are a little boxy or or a little have interesting angles and things like that. And I think that’s cool. I think you can make that work and blend it together. That’s that’s really awesome. That’s my little architectural insight for today.
Unknown Speaker 4:53 A couple weeks ago I was in IKEA and I just love seeing these. They they set up there the little kind of little spaces of two meters and stuff like that. I think that’s kind of great, cool design. But anyways, let’s dive into the discussion what we’re here for so honest, no BS, honest OBS, what is a data mesh? And what is not a data mesh? Let’s kick off with that.
Unknown Speaker 5:18 Alright, so, um, databases will approach. It’s not a thing. It’s an approach for designing architecting, your big analytical data management based on a decentralized architecture, and governing that architecture based on federated and computational governance. But it has to also address the concern is, how do you do that efficiently and effectively. So it also talks about the foundational infrastructure that you have to put in place. So it’s, that’s why it confuses people, because it started as an architectural paradigm in managing big data and analytics, kind of data architecture, but it had to go further and become more to not create a mess. So it also addresses how to think about the architecture of infrastructure in that space, and how to think about your organizational structure in that space, and how to think about governance of that. That’s the paradigm and you can apply it using different technologies to your organization. And he doesn’t try to be prescriptive about what technology to use, even though I am very opinionated about that. But databases, and
Unknown Speaker 6:42 I love that. And we want to definitely go into those into the opinions and see where I mean, I have my opinions, too, there’s going to be the interesting thing is how much we align and we don’t align. So let’s start with the kind of on the technology side, then. What are the so one of the key things that you just brought up there? For me the message was it is it is an approach, and it’s about decentralization. So what are the technologies that you’re seeing kind of at a high level that need to be involved inside of a data mesh?
Unknown Speaker 7:10 Yeah, I have to disappoint. You disappoint here, because I can’t really call out any specific technologies that fits perfectly. But there are complimentary technologies that we can use.
Unknown Speaker 7:24 I mean, let me pause there for a second. And this is one of the things that I we were taught we were talking before about this is I call bs when people say, Well, I have here’s my product, here’s my one stop shop that does a data mesh, and people are starting to go say that. And I’m like, No, I mean, the whole point of the data mesh is that there isn’t a one thing like there’s a different ways and you decide which one you want to go bring in. You don’t buy data mash, right? You don’t buy a data mesh, nobody’s gonna go off and say, Hey, are you looking for a data mesh? Like, I don’t think so? Well, that’s my perspective. What do you what do you mean? Are we on the same page here?
Unknown Speaker 7:57 Yeah, I completely agree. I mean, if you think about this as an ecosystem, right, you need to have first a set of standards and a set of conventions that we agree upon as an interfaces between components and agents within an equity within that ecosystem. So that doesn’t exist for a large part of it. And then you have to think about, okay, what are the technologies that then plug in and provide different capabilities, I know I’m talking abstract, so let’s just put it into concrete examples. Right. Now, when I when we kind of defined database and started building it, there is no prior art, there is no language, there is no concept that I can describe this smallest unit of this architecture. And I can put a boundary around it like we call this thing data product, which is the smallest unit of architecture around which you can form teams like a microservices, like say, operational work, but that thing actually doesn’t exist to start with. Because that thing it needs to contain for it to be truly distributed architecture and satisfy analytical use cases, that thing needs to have, you know, access to storage of data in a way that skills, you need to have that computation and an engine that you can inject the computation into it. Because a lot of analytical use cases you actually work, you want to run your competition on where the data is, it needs to have the API’s and interfaces, right to serve that polyglot data in such a way of injecting your policies around it, I want to access the data, but I don’t have that access. So give me the, you know, the differential privacy mode of access, so I can just do analytics without really seeing the forest without seeing the trees. So there’s just like, so much need to be encapsulated in something that can be a meaningful unit of your architecture that then you can say it’s my data product complete, and that thing doesn’t exist. So then how do we even talk about the technology when we don’t have a language to this describe that this pieces of architecture that we need to build. So we’ve got to build a language. First, we’ve got to build in a system of designing this world. And I’ve tried to we’ve tried to create that language to some degree. And then we can think about Okay, well, how do I plug in the technology that exists today? underneath and above that we can talk about this layers was the gap. Does that does that help I did I completely derail this conversation?
Unknown Speaker 10:29 No, I think that’s a good framework. And I think that that really elevates this to I think the the way that you want to approach the conversation, which is not to get pulled too much into that. Is it this tour? Is that that tool? I guess, you know, a question for you would be, you know, you mentioned language, it seems like sort of the words that we use, and the frameworks that we apply here really help us define sort of why do we need a data mesh? And how is it going to kind of play in our organization? You know, I’ve heard the phrase domains, for example, as being a key aspect of how to change your thinking a little bit and prepare yourself to think more like, in the mindset of a data mesh, what are some of the key terms, would you say? And would you start to point us to that that are the key drivers here?
Unknown Speaker 11:18 Sure, this is a really good way of actually unpacking the problem and perhaps describing it. So domains are a big part of it. And the reason is, and maybe even we can go one level back and abstract away. So if you just for a moment, every one of us can stay quiet for a few seconds. And imagine a world in 10 years down the track, where every thing that we do is somehow augmented with a form of intelligence recommendations, machine learning models that, you know, tell us, the, you know, augments our understanding of the world. And, you know, we know what that those could be. And the data that feeds those things can come from every touch point, every place, my data, your data, or the organization data, the medical data can come from anywhere, then, how does this world like, how can we build something that can scale to that. So just take a few seconds, just imagine that world. And up to now, what we’ve been doing is just dumping things into a lake, exactly, dump it somewhere else would define it, we’d make meaning out of it. So in that world, you have to bring the ownership and the quality and you know, all those affordances, that makes this data actually useful, cool, as close as possible to its source, and give it an ownership, and then give it all the tools so that they can read operation, that data being shared and being or being consumed being discovered, but in a truly decentralized and distributed way. And given where we are in that in our journey towards that world. Today, we break up our organizations around boundaries and functions around domains, and then we have the bounds of trust between the organization that’s how we’re organizing our systems, at least. So then maybe that the way to break up this big problem to a smaller problem owners and this data as close to the source, we end up with a domain driven distribution of ownership of the data, and then the structure of the data and everything around it. So the domains being the bounded context within which we can establish a language we can perform a function of the business, like I did order management with customer management, or whatever your business pieces, you know, functions are. That’s that’s where we are in terms of defining, well, if we do that exercise, we sat together and that did that it created, drank a few more mushroom coffees and thought about what that world would look like, we probably actually end up with different different model, which is the data then the ownership comes to the people, the real, real owners of the data. So my data would be, you know, organized around me, I’ll probably have some sort of a grid that I can keep my data. So that’s that’s this takes conversation to future. But now we’re not there we are here. So I use the domains as a way of decomposing a complex problem into smaller problems.
Unknown Speaker 14:16 This is a great exercise, because you when you start doing it, you realize, Oh, I need to have data that comes from this place from this place from this place. And then your original kind of your mindset says, Well, yeah, we’ll just put it in the same lake or whatever, right? We’re all data is and then you’ll realize, wait, that’s what we’ve been doing already for 2030 years that and we’re still not able to accomplish this kind of idea, this future thinking. And the way you’re proposing this right to think about it by the domains, it really goes back to take this big, gigantic problem and split it into smaller pieces. And I think honestly, that’s the way how computer science works, right. You try to you take this big block and you try to put it into something smaller where the the input you have an output of the output. One black box is the input of the other one, and so forth. And then you still break it down smaller and smaller. I think that I think that’s a great way of managing very kind of messy problems. And then when you start doing that exercise, you end up realizing that you have all these different domains. And I think at the end of the day, everything should be decentralized. Now, this is something that I, I want to I want to I want to get your take on this is you’re talking about decentralization, almost is everything decentralized, or wasn’t? Or is some part centralized? What’s the true balance here?
Unknown Speaker 15:32 Yeah. And I think, always try to be pragmatic and see this as an equilibrium that we constantly have to manage and sustain. And I sometimes feel centralization, decentralization, in fact, two sides of the same coin. And the way I think about it is that the moment we decentralize in terms of the data ownership around domains and sharing it through your API’s of domains, and all of those things, in that moment, you realize, oh, now if I go and decentralize all the way down to the to the bottom of the stack that supports this model to the bare metal, does it mean that now every one of my teams, and every domain builds its own decentralized stack? And hope that they will also talk to each other? And then is that from the cost perspective? And just pragmatic reasons, is that possible? Probably not. So then what you end up doing is saying, Okay, I’ll give a layer of utilities to these top domains, the tech stack that they need to build these data products. And likely, from their perspective, they’re seeing this as a centralized kind of layer of API’s, centralized platform, within that, you can still have decentralization, like you can have different teams looking at different aspects of it. But to have that kind of ease of use of that technology, it’s probably a centralized layer of from the perception of the user, perhaps a centralized layer of utilities that they can use. And then within that, you can again, have the centralization of, you know, okay, I do access management, you do encryption, I do storage, you do pipelines, whatever it is that that sits in there.
Unknown Speaker 17:11 Right? in it. You know, this is super interesting. It makes me think of some questions that I’ve gotten some people from some people about, you know, how to get started with with data mash, and usually we start with domains, and we start talking about that, and like, what’s the right number of them? Like, what’s the balance between in between centralization and decentralization? And a lot of times you start to get into this question of like, well, how premeditated does this need to all be? Right? Like, do I need to think ahead of time, like, Okay, well, I don’t want to have more than 10 domains. So what are those 10 domains? Gonna be? Oh, we better premeditated right now. Like, like, I guess, how do you how do you think about getting started with this? This kind of approach?
Unknown Speaker 17:54 Yeah, I, I find those kinds of exercises I’m using, and definitely engaging, but are they? You know, are they getting giving us result? So I would think I would think about it very pragmatically, why did we want to decentralize in the first place? Because we want it to mirror how we are decentralizing our business and other applications. If you haven’t, then don’t bother with database, perhaps. But if you have, and if you have different teams already responsible for different functions within your business, then all capabilities within your business, they just use that as a starting point. And if you don’t have yet that platform capabilities to allow, have these autonomous teams, and you’re not there yet, well, maybe there’s a point in time that is you go from a centralized model to then a decentralized model, because that, that having the economy of scale that every team runs around and does its own thing and have its own data, and yet these things are connected, and yet these things are monitored and understood at a global level requires the level of maturity of the platform that enables that, yeah. So then there is the axis of evolution as where you start with the adoption curve of data mesh within your transport in your organization, or the curve of transformation, where you start looks very different for where you end. And then you have to be pragmatic that where I am today, does it make sense to have 50 of these things running around? Probably not MSA, this era? I mean, I what my thinking is being influenced by seeing that kind of migrants, microservices, and so on from, you know, more than a decade ago now, and this was the same conversation. I mean, you have to be this tall to be able to run my services in the right database, and that being this tall is a set of like, data platform capabilities, you need to have it in yourself. So passion. So if you’re not this tall, maybe start with a smaller number, but the mirror your business mirror how your world is been distributed,
Unknown Speaker 19:56 so don’t overthink it. Don’t try to boil the ocean around it. My Start where it makes sense to do what’s natural for your organization and iterate?
Unknown Speaker 20:04 Yeah, foundation so that you can scale out, right? The whole purpose of having these domains so that you don’t have to scale up like a lake, you can stick it out based on boundaries of trust and boundaries of domains.
Unknown Speaker 20:17 And that’s, I think, you said, mimic the business and the different kinds of domains that are already existing in the business. I think this is key, because I think that’s how you want to go start small, it’s like, well, let’s, let’s define what those 10 domains are. Let’s just start with one. And let’s get one started, the one that that is kind of most interested in participate, and then I’ll get the other domain involved, and so forth, and then start with the marketing department or whatever. I think at the same time, you will start building these best practices, because at this at some point you can have as you can provide best practices, that that can be fairly generic. But at the end of the day, like these things are part of the culture within one’s company, about how you deal with data, how you’ve set up teams, how your governance, what type of governance style, are you are you really focused about risks are you really taking things about to the next level and kind of be more open about it, I think, really depends on on the culture
Unknown Speaker 21:07 and work backward. Right directions. It’s interesting that you mentioned marketing marketing is one of the hottest use cases, in fact, to bring data to life, because when you look at marketing function, they are probably one of the few parts of your organization that you want to look at, across your product across your touchpoints. So they want data from many different domains. So even if you pick one use case on the marketing and work backwards and say, okay, for this particular use case, like on segmentation of like customers, or whatever it is that I need to pay these machine learning models, or records, by work backwards, which domains Do I need to have access to
Unknown Speaker 21:43 this is, I’m also working on a marketing project exactly like this. And it’s fascinating because they’re like, well, I got this thing, and then this and everything that they look at is touching a customer or touching the product. So they’re involved in so much places, I think that’s also probably another kind of interesting takeaway here is that the marketing domain is one that it lets you kind of get touch touching with different aspects of a business.
Unknown Speaker 22:05 But But you’re not going to, like consolidate that data in the marketing department, you’re going to be bringing up a mesh that feeds the use cases.
Unknown Speaker 22:15 this is this is the aspect of kind of the decentralization and centralization that I’m seeing, or this is my point of this is my opinion, I want to, I want to hear what you think about it, which is, look, the typical thing, what do you call a customer? Well, you know, what, the marketing department has a definition of the customer, let them define it, right? customer success has another definition for it, the sales folks have another definition for it, okay? Let every every domain have their own definition of a customer, they will write it down in English, in natural language, they’ll generate data for that. And at the end of the day, they’re gonna deliver a data product, right? Here’s the data probably involves customers, and the people who are consuming it. They’re the ones who are going to be I’m happy with this, or they’re going to complain about it. And then at a central point, there needs to be a central point who is cataloguing these reviews, who are cataloging the complaints. So the recommendations, and and I think, who always mentioned, let’s enable friction, and lets people know that that they agree, they don’t agree. And then you put them in the same room and you say, hey, look, not only don’t even agree what a customer is, because we always we always we already know, we don’t agree what a customer is. But here’s the actual data that you’ve generated. And in best way you talk to Bob and Alice, because they’re the ones who own that stuff. So I think a central point, you want to be able to centralize what those core models and that central mod and that centralized kind of group, at some point are the ones going to be paying attention to what the consumers are doing and those complaints that they have, and they’re going to take it back to all the domains and let them know about it. That’s how I think about it. And it’s like this living organism within a company that you’re never going to get this perfect, and then everyone’s gonna be happy, it’s always going to be changing. That’s, that’s my perspective.
Unknown Speaker 23:58 I do agree with that. I do think though, there is a slippery slope we have to be Watch out for and I don’t think there is a convention I can point to and say, let’s solve it that way. And that slippery slope is that in my domain in my customer domain, and the marketing domain customer actually looks different. Because I look at different aspects of the customer. And order management, look at the different aspects of the customer. While I do agree that when these data on the inside inside of my app, I can just design it however I won’t, because it’s just for my app, right. But when the data becomes data on the outside, and the data outside language and data meshes, like the data products, and particularly data products that like not look at just the current state, they look at the historical state of customer and the audit that is pulling that data on the outside. I do agree that needs to have a mapping context, be able to have to have a way to map that internal context to an external context where other people understand what If the mechanism to do that was, let’s define the customer in one place and everybody agree with that, we end up with this bloated definition of the customer that needs to encompass all those different views and nobody actually is going to use it. So then the mechanism of arriving at that consensus, so I can link the customer from this place to that place, I’m still, you know, understand that it’s the same thing but look at it looks different at differently are, what are those minimal mechanisms like unification of IDs, having links between those entities be able to link them, those are some of those fine grain because I have to put in place but with Godot, and I do agree that you need to have a way to just get these things.
Unknown Speaker 25:46 That’s a cool, that’s an exact one that like identifiers. And that’s something that needs to be managed centralized in a central manner. Because otherwise, I mean, we’re gonna just end up having more and more identifiers. And you’re telling people like, go reuse this, I mean, the same thing for some types of schemas and models out there.
Unknown Speaker 26:02 But again, data on the inside on the downside, so data mesh tries to address the difference between prior prior thinking like ritual decisions, fabric, and so on. It’s like, let’s be respectful of that autonomy of different domains and applications. The data on the inside is designed optimized for them to move fast to do what they need to do data on the outside, which is a data product, which is designed to share and get consensus and share across and correlate, there might be a gap between the two, the bigger the gap, the problem, we end up in a problem because they don’t have that insight, they on the inside, we’re turning to data products on the outside. And then those things, you know, feed machine learning models that get embedded into the application. So the moment you come and say, okay, for this customer would come in, I don’t know that next music track they’ve got to listen to you got a disconnection. So you need to keep those things close, but yet need to allow them to be different, because they’re built for a different reason, like the database, my database for my application to play music is has a very different access model to the data on the outside that says what music people have listened to. So there’s just some nuanced things in there that we’ve got to be respectful of the differences.
Unknown Speaker 27:21 Yeah, no, that’s interesting and in to your comment about like, respecting the differences, but also making sure that things make sense and kind of come together, you know, that makes me think a lot about sort of the governance side of things. And I know that obviously, with sort of the decentralization and balancing that with data mesh, there’s this sort of umbrella function, the governance function that needs to be effective to kind of keep everything all together, you know, how do you think about, you know, the managing of governance and and sort of and sort of handling that overhead there? Like, like, for example, you know, is, how does stewardship play a role? And you know, how do tools like catalogs play a role? Do you do you have some, some frameworks that you recommend around that kind of stuff.
Unknown Speaker 28:07 I have the caveat that I’m really no expert in this. And I, and I go back even to think about when I thought about the governance model, which I call, you know, federated competition of evidence, I felt that as human beings, we’ve been struggling and wrestling with this, again, balance between individualism rights, particularly when us like, by my domain, I want to do my own thing, I want to move fast, you know, I don’t care, I know, my own data, or whatever. And then it come in Good, well, great, you’re moving fast. But you nobody can use your data, you’re breaking everybody else. So that I don’t know from Aristotle’s times of like, difference between comment, you know, balance between common good and an individual exam. So I think the mesh governance had to have a both an incentive model and structure of people and roles that constantly tries to counterbalance this, these two polar poles, right. So then the thinking behind it is, okay, we’ve got this data product owners that are they have local incentives to have make your data product. Awesome. What does that mean, a lot of people are using it and scientists are recommending to their friends, it’s easy to discover all of this good things, but then counterbalance that with global incentives or your data product. You’re going to get extra bonuses if you’re doing it but it actually talks to other data products or other people on the mesh. They’re using it, they’re connecting into each other. So the network effects, we want the network effect, right. So the incentive model is a dual kind of incentive model that the group that governs the mesh, compose this federated from the folks that have that local responsibility. And then let’s make it real like to make this real we have to push complexity down and make it automated and make it embedded into every one of these notes on it. So we’ve done this with, you know, zero trust architectures and so on in operational world, when we went from like on prem to cloud and how we thought about, you know, policies occation and configuration, at each microservice, we can, we can do a similar if we can just take that learning, but then you apply it to the data concerns, right. So then, let’s put this everything we agree that it’s a global policy that we all adhere to, let’s say how we describe our schemas, for example, what meta language we use to describe, so let’s let’s, let’s put that into the platform layer, and make that so easy for people to just adopt it, and then get bonus points for you know, global incentives. If you are up to date with your version of the you know, schema that we are using, then you get extra.
Unknown Speaker 30:47 This is I love this about the get bonus points I’ve I’ve called this in the past is, that’s how you be a good data citizen, right? People are gonna like, if, if you’re, if you’re part of the mesh, but you don’t use a schema people don’t know about you’re not documented in your stuff, people are not going to go use it. It’s like, I’m not gonna go use one’s data like that sucks. They have to go, right?
Unknown Speaker 31:07 Very, very rarely are people measured on this kind of stuff, right? Like the idea of a KPI around like, well, how many users are there of your data product? And are they happy? Like, how many organizations are measuring that maybe they should be?
Unknown Speaker 31:19 Right, I think they should be on that stuff. And then thinking about really like, what you say is like the data product is one thing, but also have it connected with other data products. I think it’s important, I think, kind of also my background, this is why I think knowledge graphs, and using just graph technology is is is an add technology for implementation. Because you get that for cocoa for free, be able to go connect your data across things go able to go share your metadata. So from from the technology perspective, I think having, for example, catalogs, I see catalogs is is playing two roles. One is to go a tool for the data engineers to go catalog the existing data, which is not a data product, those data sets are ugly or unorganized. You don’t want to release that. But you need to go understand what that is. And once you’ve organized that you’ve created a data product that needs to be cataloged. So other consumers can go use the catalog to go search the products, not the underlying data sets that those are the what what I call the inscrutable ugly enterprise data.
Unknown Speaker 32:19 yeah, I do agree like you, you. So the way I think about it, I’m sorry, I didn’t answer your question around baby paddle Have you previously asked as well is that, you know, once you have these distributed, well, nicely playing nice citizens have the most data, there are products and they talk to each other, they convert each other’s data, they connect, because they have relationship with each other in terms of their semantic. But you need to have a even though I think very bottom up in terms of decentralization, like each one of these nodes should be self sustaining autonomous, you’ve got to be able to hit an API on this data product and discover it and have all the information about it available right then and there. It’s metadata, its timeliness, schema, all of that. But you still need as a user to have the global view of this mesh, right, you need to have a way of searching it browse. It’s all of those things. And that’s where, again, I refrained are worthy to catalog because I want us imagine even new words, I want us to imagine that 10 years down the track. So I just call it for an app data discoverability or data exploration tool. And don’t use a mechanical, so something that lets you discover and explore. And what you can discover and explore could be a knowledge graph that has emerged from the mesh. So Knowledge Graph emerges from the animations, not the mesh itself, because the mesh itself is its data and computation and schema, all of those things. So it’s an execution context, as well as the data and then the Knowledge Graph emerges from it. So if you want to browse that knowledge graph, and you have to have a window into it, and and today’s window is basically what we’ve called data catalog, but it may be an inverted model that instead of instead of going, as you said, one, go and looking the data inside and try to like apply a ton of machine learning to figure out what this column names and tables actually meant. And what was the relationship try to inverted on its head and say, well, that’s great to have some sort of intelligence at the top to look at this. But let’s assume the nodes themselves are self descriptive and self discoverable and have some level of quality submit somewhere in between where we have thought about data catalogues is this master tools that get all that you know, intelligence out of non intelligent means?
Unknown Speaker 34:37 That By the way, I always say 30 minutes fly by but this is really well and actually, I think Tim, we’re going to I’m going to do an executive decision here is like, let’s keep going for a bit like I think this is I got a couple more things I want to go talk about. Yeah, maybe a bonus section. A bonus section here. So one of the things First thing I’m talking we’re talking about definitions is honest no BS. What is a day A product, how do you? How do you describe a data product? My answer at this moment is, it is a beautiful table that people understand the columns make sense, you have the have definitions, and it’s end up being a table that I will still open it up in Power BI in thought, and that that’s one for me. That’s my, that’s my, I want to hear from you what is a data product.
Unknown Speaker 35:25 I have a lot more hopes for this little data product than just being a table. So if I have, if I want to see this data, product grow and be the thing that I had hoped was in fact, a new, completely new architecture, quantum a unit about architecture that abstract, everything you need to compute, and provide access to a domain data with ability to also execute policies on it. So it’s a new abstraction that maybe when you went three level down API’s, you actually get to a nice, beautifully designed table. But to create that table, you need computation, you need those transformations to actually create that table to serve that table in many modes of access table is just one mode of access, points of access, you need the API’s and projections and transformations that do that, to actually get to that table with the right access control and make sure you have access to that table, you need to have, you know, policy engines right next to it to do it. So the container that I put around all of this, which is compute policy data as one unit of architecture that now I can really put my hands on my heart and say, This is an autonomous unit, right? And if you can have many of these things, and they connect to each other, then it becomes more than just a table. But that beautiful table that you described has to be somewhere in there. Right? It’s all about data anyway. Yeah. But we need a new and that’s that’s really hard to convey, because we just don’t have it, we have
Unknown Speaker 37:05 exactly that. That’s the thing I struggle with. And at the end, like, yeah, it’s it kind of seems like it’s, it’s underwhelming is like all of this, and I’m just getting a table, which is still Excel, like Yeah, but I mean, think about it, it’s it. For me, it’s like look at the column. The column has a name that you understand it has a description, but data underneath it is well defined, it should be clean about it, you know, where it comes from the exact lineage. If you don’t like it, you don’t you know, who to complain to, you know, who’s responsible for that. I mean, all of these are the things that go around treating data as a product. And yes, even though in its simplest terms, like a table, but there’s much more around that. And I think that we need to convey and physically show it to people. And that’s what I have right now to show. So that’s one of the things that I’ve been thinking a lot about. And and, and the other thing that we were chatting about is you we were slacking earlier today, you said we have a choice to reimagine or rebrand. And I really love that because it’s like, let there’s a choice of a path of change. Instead of just putting a fresh coat of paint over, we’ve always done and I think that’s really when you start thinking about something very, very different. And I think that’s part of the message here. It’s like, let’s stop. Let’s sit down. Think about what we’ve been doing. For so long. Think about what the life should be. There’s this big gap we let’s not kind of put lipstick on a pig here.
Unknown Speaker 38:31 I think that Yeah, not not just, you know, data fabrics, not cool anymore, or data mesh is the cool thing. Okay, cool. I’ll say mesh now instead, like, wrong?
Unknown Speaker 38:40 Well, I think this is a good thing to do. To give us an idea of when you say we’re gonna go do here, it’s called the honest no BS lightning round. So we got, we prepared five questions. And the questions are yes and no answers. And we’ll give you a small amount of time to support your answer. So slow. Well, we got five questions here. So pick off question number one is a Data Fabric a data mesh?
Unknown Speaker 39:05 No. But they’ve complimentary. If you think about Data Fabric, when it was created by the US folks and what problem they tried to solve, they tried to solve access to data wherever it is, and be able to integrate it. And that was a point that people were going to the cloud, so they had to solve the problem of hybrid. I’ve seen Data Fabric implementations that they still have, at the end of the line, they get this data extracted from all sorts of databases placed everywhere, but at the end of the day, they dump it into a lake or a warehouse to actually run analytics on it. So I think they’re complimentary. I think fabric can be in the bottom layer of this stack your bare metal layer of this bag, and then look at it logically, with a new set of kind of technologies as a mesh overlay that I think there are synergies and they complement but they’re not the same thing.
Unknown Speaker 39:54 I like that. So for everyone listening, you can do both. It’s not one or the other. All right. Question number two is is data mesh an architecture?