NEW Tool:

Use generative AI to learn more about data.world

Product Launch:

data.world has officially leveled up its integration with Snowflake’s new data quality capabilities

PRODUCT LAUNCH:

data.world enables trusted conversations with your company’s data and knowledge with the AI Context Engine™

PRODUCT LAUNCH:

Accelerate adoption of AI with the AI Context Engine™️, now generally available

Upcoming Digital Event

Are you ready to revolutionize your data strategy and unlock the full potential of AI in your organization?

View all webinars

Why we need to focus on Knowledge Management, NOW! with Andrea Gioia

Clock Icon 66 minutes
Sparkle

About this episode

Andrea Gioia, CTO at Quantyca and Co-founder of Blindata, shares his insights on the real issues holding back data management. Spoiler: It's not the technology. Andrea discusses how the biggest challenges stem from people, collaboration, and the ways we handle knowledge. As AI continues to evolve, poor data management becomes an even bigger obstacle, making it clear that prioritizing knowledge management is more urgent than ever.

Tim Gasper [00:00:32]:
Hello everyone. Welcome. It's time once again for Catalog & Cocktails. It's your honest, No-BS, non salesy conversation about enterprise data management. With tasty beverages in hand, I'm Tim Gasper, longtime data nerd product guy at data.world, joined by Juan Sequeda. Hey Juan.

Juan Sequeda [00:00:49]:
Hey Tim. How are you doing? It is Wednesday, middle of the week, end of the day and for some other folks it's literally, literally the end of the day. And we're super excited to have this new episode here live. And today is an episode which I'm excited about for the, for the following particular reason. So we have Andrea Gioia, who's the CTO at Quantyca and a Co-founder of Blindata. And Andrea is somebody who want I think physically like we've never, we, I think we've passed so many times in the hallway of conferences and then we're actually coming to the same kind of positions of how we think about things but from different, from different perspectives. And I, I love following his posts, I love following all the comments on LinkedIn and it is fantastic to finally have you and I know it's live and it's late for you so thank you so much. How are you doing, Andrea?

Andrea Gioia [00:01:39]:
I'm, I'm fine. Thank you for having me on. I followed the, the podcast since, not since the beginning but a long time, a long time now. So it's really an honor to be with you today. So thank you for having me.

Tim Gasper [00:01:54]:
We appreciate it.

Juan Sequeda [00:01:55]:
Well, so let's tell on toast. So what are we drinking and what are we toasting for?

Andrea Gioia [00:02:01]:
So I think a lot of what I could prepare for this special occasion but like when I take a pizza I check all the menu but at the end I take the Margherita, the simplest one at the end I made up. I decided to do a rum and cola, that is one of my favorite. Very, very simple but always one of the best. And I toast to-

Tim Gasper [00:02:28]:
I love it.

Andrea Gioia [00:02:29]:
Simple and easy and fast. Yes, absolutely. And what I'm toasting for, Well, I think that we can toast for the fact that we are doing as a job something that we love, that we really care and it's something that we cannot give for granted, you know, so it's it's. We are lucky, let me say. So we can't ask for. Definitely.

Tim Gasper [00:02:57]:
I would love to cheers to that.

Juan Sequeda [00:02:58]:
Let's cheers to that. I. And just to add, I think we are very lucky. And then we, we kind of, we live in this bubble on LinkedIn and stuff like that and, and. But everybody who we interact with, it's like they're so passionate about it and then they, they, you can feel that energy. And we're just really lucky that we're in a career in a place that we really love. Enjoy. And there's just really fantastic people around it. So, cheers. Cheers to that.

Andrea Gioia [00:03:23]:
Yeah.

Juan Sequeda [00:03:24]:
Tim, what are you drinking today?

Tim Gasper [00:03:27]:
I am drinking a little bit of something I picked up recently. I really like Edrador. Edrador is a nice Scottish whiskey producer and I picked up something unique called Caledonia. So pretty tasty. What about you, Juan, what are you drinking?

Juan Sequeda [00:03:45]:
I'm going back to my, my Aperol mix. My Aperol I used now a scotch, a blended scotch Grant and some bitters. So I guess you were bitters. So it's like a very nice punch in the face. Old fashiony. Really enjoying this stuff. Taste it. We experiment with that Pearl spritz. So. All right, we got our warm up question today, which is what is something you need to focus on now? Now related, network related.

Andrea Gioia [00:04:16]:
I think that of course the, the podcast and that for me it's calling the day because it's very late in Italy. So I have to focus to go to bed and sleep well because tomorrow a new day begin and so I have to run very, very fast from the morning till 2, all the day. So I think that I'm focused to have a good sleep.

Juan Sequeda [00:04:41]:
Tim, how about you?

Tim Gasper [00:04:43]:
I think right now I need to make sure I'm practicing the piano a little bit more because I have this nice keyboard over here and I use it maybe once every two weeks. And so I need to practice more piano. That's what I need to do right now. So how about you, Juan?

Juan Sequeda [00:04:57]:
I need to focus now on doing more reading. I have my backlog of books and I'm kind of cheating. I'm not reading them audio books, but I gotta, I'm going through them today. I mean, I started like this is something, I'm like, how much have I been listening to right now for books? So I'm actually lowering my podcast and trying to increase my reading of a backlog of books and.

Andrea Gioia [00:05:18]:
Okay, if I can add because you have specified something related to learning. I just specified something related to sleep. And rest. What I want to learn and improve a little bit in the near future is my yellow score on chess.com so became a better chess player. Now I'm really terrible, but I like the game and I want to improve a little bit.

Juan Sequeda [00:05:38]:
All right, there we go. You see what I like about these questions. We get to know so much about our from our guests and even from ourselves what we like. But all right, let's kick it off honest, no bs. Why do we need to focus on knowledge management now?

Andrea Gioia [00:05:53]:
Because we are in the information era. Information is related some way to knowledge. So basically if you look at all the most recent economical theory, basically what make a company competitive on the market is basically the way in which it collect, produce, formalize and share knowledge. So knowledge is very important not only because of the data and data management, but knowledge for a company formalizing this knowledge became a learning organization is very important to compete on a very volatile market. To make it very simple to understand. Just consider how many meetings we are doing every day, every week. And a lot of these meetings are just to synchronize our mental model and be sure that everybody have understand the same thing with the provided information. So we do a lot of meeting and try to say you have understand what I say, please repeat me and I repeat you. And we try to unsync our mental model without never drawing the model we are talking about the model remain in our minds and we try to leverage only the language to synchronize our mental model that are different of course. So why not writing down formalizing these knowledge so we can save a lot of meeting, a lot of misunderstanding and we can create better solution not only in the data management but in general in our organization.

Juan Sequeda [00:07:27]:
I liked how you started off and you went on a part that I did not think which was just imagine how much time we spend on meetings about just trying to kind of synchronize our our kind of our understanding of things. And then we're like but we don't even write it down. I mean we write it down little on language. But a lot of these things are just like these mental models you mentioned. Yeah, we should be able to go dry it down, right? I mean these goes back to like the business processes of. I mean this thing impacts this thing. And let's make sure. And I think this is a great exercise that we just need already important takeaway I'm hearing right now is like let's just try, let's try to track right now what we are actually how much time and Energy we're spending on keeping track of this communication, on this knowledge that's being communicated.

Andrea Gioia [00:08:14]:
A lot, A lot. In fact, the basic idea behind the concept of a learning organization is an organization that is capable to transform is implicit knowledge. The implicit knowledge that is stored in the mind of the employee in explicit knowledge and the reasoning upon the explicit knowledge instead of having a lot of conversation to, to try to synchronize the people or coordinate the people around something. But this is also without talking about the learning company. But if we think of the domain driven design that is quite popular in the technological field, the domain driven design basically is the domain modeling driven design. Because the main idea is that create a solution is first of all creating a model of the problem space. And the solution is something that came out from this model, this shared model of the knowledge space. So the idea is not the talk, not define the needs and then translate the needs in a requirement document and then pass over the requirement document to the team that created the code and then have the final product and return back to the user, but create all together the model of the problem, one single language, the Beakitudes language. We create the model of the problem and we discuss over the model of the problem how we want to solve it. Not just a document that are passed by one team to another team. And the model remain implicit in the mind of each team, so of each individual belonging to each team. So this is the idea. Make it explicit as much as you can make explicit the knowledge, the better is the coordination and the better is how you can solve the problem.

Tim Gasper [00:10:00]:
Now this is an interesting clarification, I think of a lot of the work that people do that sometimes people call by a lot of other words, right? Like, you know, some people call things, you know, governance, right? Sometimes people would call it, you know, documentation. Sometimes people would call it process modeling. Right? But like all of these kind of different things, right? And those are just a few examples all ultimately, what are you trying to do where you're trying to collect knowledge, which maybe was more implicit, you're trying to formalize it. And I guess these are all different techniques by which to formalize knowledge. Oftentimes technology plays a part. But isn't the end all be all in terms of formalizing knowledge, right? And then you want to actually activate that knowledge in some way. How do we leverage it to actually create a force multiplier for the organization? And this is all, I mean, is it fair to call that knowledge management? Like, I think sometimes people get a little pedantic about the phrase knowledge management and they say, well no, that's something specific. There's a specific thing called knowledge management. Like is this knowledge management? Is knowledge management something else?

Juan Sequeda [00:11:09]:
Yeah. So let's talk about what is your definition of knowledge management. And you're also mentioning like data management. And we're now seeing this so much, right? We're talking more about knowledge management and going from data management to knowledge. Like what does that mean? How are you defining this?

Andrea Gioia [00:11:26]:
Yeah, yes, for me the knowledge management is the way in which an organization manage the knowledge its knowledge life cycle. So how the knowledge is created within the organization, how the knowledge is socialized within the organization, how the knowledge is formalized within the organization, and how the knowledge is applied to the activity of the organization. So all these activities for me and the definition of the process, procedure, operating model to manage these knowledge value flow is knowledge management for me.

Tim Gasper [00:12:04]:
So did you said create? Did you say socialize? Was that the second one?

Andrea Gioia [00:12:10]:
Yes, we create, socialize, combine, of course, formalize and then use it. Of course.

Tim Gasper [00:12:17]:
And then apply it.

Andrea Gioia [00:12:18]:
Yeah, and apply it. Yeah. There is I think a famous model that maybe I quoted not in exact way, but there is a model of Takechuki of knowledge management that is four phase process, user that to translate to create knowledge from the creation of knowledge till to the application of knowledge. It is named, I have there. It is the Sashi model. So socialization, externalization, combination and internationalization. It's quite popular model in the economic theory on how a knowledge organization can manage its knowledge. But basically these are the four phase. So you want to move from how you create, how you socialize, how you formalize and how you use.

Tim Gasper [00:13:04]:
Yeah, I like that. This gives you these phases of knowledge capture. How do you compare then knowledge management to data management? What is data management in comparison?

Andrea Gioia [00:13:20]:
Strongly related because for me there is this concept of the information architecture in which the data is just the base of the architecture. Then you have to put the data into the context. So you have to describe the the data to metadata in order to provide the context in which the data has been generated. Metadata usually answers the question who, why, when, how, where related to the specific data set. So the context in which the data has been generated and then the knowledge is the way you organize the information in a model that make this information actionable, that I allow you to describe the domain and get the meaning, the understanding of how a new fact. So a new piece of data contextualized to the metadata can be actually used in a purposeful way according to the specific goal and way of working of your company. So formalized knowledge is a kind of information. But is information with. With an end with a goal attached is a model that is designed to a specific proposal. Is information structured in a way to support a specific goal? So are related because are part of the same information architecture. If you want to use the data and extract maximum value from the data, you need to have the metadata to contextualize it. But you also need to have a knowledge model to understand how to use the data within your domain.

Juan Sequeda [00:14:58]:
So, so okay, so my, my. Look, I, I'm totally with you, right? You know, of course. But if I, I continue to put on my, my, my devil's advocate skeptic skepticism hat on and we're like, so what I mean we're solving problems today, right? I mean we are being somewhat data driven. Could we be better and so forth. But like what is the actual incentive to start creating that model of the problem to start formalizing this? More like why aren't we doing this today? And in a way you can argue that the way we're doing it today is probably enough and we haven't found enough incentive to go spend time to create that model to formalize it. Why now?

Andrea Gioia [00:15:47]:
Yeah, I can tell you my story because my background is not in the semantic world linked data I came from, I always work with data. So I started as a data engineer when the term does not exist, really not exist yet. So I'm doing ETL and data warehouse. And now I work for a company as a CTO that is specialized in data management. So I come from data management, not from the semantic world. But three years ago, I start to see a problem in my customer in project that we made for our customer, of course we have pushed a lot an approach of modularization of the data management solution start to managing data as a product. So a more modular solution to better manage the complexity of this kind of solution that are very, very complex. And we see that the most successful customer that start to have a big portfolio of data product have also a big problem. They are not able to reuse this data product. Especially they not able to combine these data product across different business domain. So basically we are not delivering what we were promising because they have a lot of data product. They have spent a lot of effort, extra effort to create not just a data set, but a data product around the data set. But. And everything is perfectly good and fantastic weaving the domain. But as soon as they are trying to combine the data product made by marketing and sales domain, this Became very difficult. They have a lot of discussion back and forward to understand the meaning, how the data can be joined. And basically the products are not reused. Every domain starts to create a copy of product that already exists in other domain. We start to see that even with this approach, the old problem of the past started to return. And so I was looking for a solution to create something that is really composable, not only technically speaking, but also the semantic level. So the business can join two products that come from two domain without having the problem to ask. But this address is the same address that appear on this other product in the sales domain or what. So yeah, to call the person, ask and do a meeting at the end of the day to understand the things. So when I was looking for some possible solution, I have here a book. This is the data centric revolution that I have read somewhere three years ago. And I have bought. I have bought. I don't know where is the red one, but okay, I have bought the problem and the solution together. I read the first the problem, but I already know the problem.

Tim Gasper [00:18:51]:
Just for. For those who are just listening to the podcast. Juan also has this book and the prequel to it.

Juan Sequeda [00:18:56]:
People who listen to the podcast will know and I always say this. Also a former podcast guest, David Comb has they call the must read book by every data professional in the world. It's called Software Wasteland. This book only describes the problem and then you have the data centric revolution, which is the solution which we're talking about. Sorry.

Andrea Gioia [00:19:14]:
So when. When I read the problem, I just have my confirmation bs. So I started to get angry for read the solution. And when I read the solution, a new word opened to me because I was absolutely unaware of. Yes, I've listened a lot about linked data, but I see something really theoretic outside the field of a data management practitioner. Instead I start to study was at the beginning because it's not something very simple from an outsider. Some specifications are quite complex and articulated. But in the end I come through and start to understand understand how the things can be really used not only in a theoretical way, but can really use it to create this model that enable the interoperability that I want to add between the product and we start to apply to the customer. So this is the reason that take me where I am now. I'm not a researcher, a PhD in the University that studied there. But I come here from a very practical problem. So I think that is for me was very real. It was a solution to a problem that I faced every day. Ever before the generative IE revolution. So I'm not doing knowledge management because now it's so important for generative, for applying generative IE to domain specific use cases. Of course, this is another super plus that came out later. But I came there because of, I have the need to integrate the data product across a large portfolio of data products spread across different domains.

Juan Sequeda [00:20:53]:
So, okay, this is fascinating and I really love how you describe the problem, your personal pain point around this now. But before I dive into this, I want us to dive into a little bit about the data products. Because the way I understand it is that you are already kind of how you, how you realize what the problem is, that you had a problem and you said data products is a solution and you start implementing data products. Yeah. How long ago was this?

Andrea Gioia [00:21:22]:
I think three years ago. Yeah.

Juan Sequeda [00:21:24]:
So this is the height of data mesh. You start focusing different data products and then you're like, oh, this is what we've been missing. Treating data as a product is the solution to everything. And then that's still not enough. So. And what I find fascinating about this personally is that what you did, the story you described is a story I hear constantly, not as much, but throughout my career, why people end up kind of realizing, oh, we need to invest in semantics, invest in knowledge. And actually I, I gave a talk a couple weeks ago at the DBT Coalesce conference which, and the title of my talk was what do enterprises not investing in semantics and knowledge. The next slide was Reuse of Data. That's it. That's the answer. You can leave now. All right, but, and this is exactly what you just described right here in. And so now I still ask myself, is this enough of a motivator? And, and I'm, and I, and I hear like, I left that, I left that talk. Everybody who came to that talk was just nodding and nodding, nodding and, and actually kind of. I'm sorry to toot my own horn here, but, but like I, I actually got called out like on the keynote on the last day, people saying, I gave a great talk about that. And I think people are real, hopefully people are realizing around this. Right, but what is, what is it that we need to convince more people to say, start paying attention to this. Otherwise, as you just said, the problem of the past are going to continue to return. Like, is this, are we just doomed to keep repeating or, or, or what's missing to get more people converted? Like what, what you had. They just need to go read the book. I don't know.

Andrea Gioia [00:22:59]:
Yeah, yeah, I Think that first and foremost is a problem at the business level. Because create a shared model. Because we are talking about the creation of a shared model of the knowledge. Not a unique model that explains everything at the maximum level of detail, but a minimum viable model that put together everything the different C level of the different domain can agree upon on the core concept of the company. So it's not, for example, just taking the concept of customer. We want to create a model of the customer that contain the basic information on which each department can agree upon and guarantee the interoperability. Then each domain will have some specification, extra information, extra specification of the customer in their subdomain ontology that extend this concept. But we need to find an agreement and push this agreement till the limit of which is impossible to find an agreement. And we needed to specialize. Now, the bigger is the common concept, the more possibility we have to coordinate and integrate the work we do at the domain level. To do this is not something that can be technical people. It's something that must require to have at the table the VIP of the different area. Because it's very high level discussion. So what we see that is in common between us about the concept of the customer, it's not a simple conversation. And I think that business still sometime live in the last millennium. They are really focused on the organizational model in which the most important thing is to optimize the activity within their department. Not to optimize for the general outcome of the system as the company as a system. So why we have to spend time to find an agreement between VIP of what is the customer to create this interoperability? We do not need. We have objective to achieve by the end of year. And to achieve this objective, we need to optimize our department, not to optimize the system, how we collaborated with other departments. So I think that there is a really economical and organizational reason, because this is difficult. So you have to find situation in which the business start to in order because of the pressure that comes from outside start to have the feeling that the traditional organizational model that is concentrate of the optimization of the path and not of the optimization of the system should be challenged. And so you need to have business people that that have this mentality. And from this mentality you can put them on the table and start to discuss the core concepts. So it's not something really technical. We think that is something really technical. But instead I think that is more at the organizational level. And the way that enterprise operator right now consider that from a system thinking, optimizing A system that is composed by optimized part is generally not an optim system because the over optimization of the path create a systemic problem. So you need to have some resources that are managed in common between the different department to reduce the external negativity that are generated by the activity of the single department. If you have not this mentality, each department work to optimize its result, not care about the external negativity generated to other department.

Juan Sequeda [00:26:47]:
No.

Andrea Gioia [00:26:48]:
So if I create a terrible data, but I can achieve my goal because I move very, very fast and I create the extra cost to other department that need to integrate my data. This is not something visible today within the company because the MBO system does not measure these external negativity, just measure the MBO of the marketing vip. That's the problem.

Juan Sequeda [00:27:12]:
I am really thankful for this conversation right now because it's connecting with the talk that I gave last time a couple weeks ago at DBT and, and one of the positions I had was, look, we live in this, I always talk about this, this balance of efficiency and resilience. We're incentivized to be efficient, but not about what about being resilient and. But you gave it a different angle which I really appreciated is that part of that efficiency means that we focus so much on just being efficient and optimized for our department and we're not incentivized. That resilience is, is the communication over the system. Right. So I, I thank you for artic. For giving me a way of articulating that point of trying to go in for people with, with a different kind of point of view and example. So, so, so thank you very much, Tim. I know you wanted you. You had a follow up.

Tim Gasper [00:28:01]:
Yeah, I have a follow up question which is that Andrea, I think that what you walk through here spell business motivation and an approach that I think that it's important for more people to take really seriously. And I think a lot of bigger companies that have started to drink this Kool Aid really understand it. You know, we have a lot of people who listen to our podcast who are, you know, data leaders, data practitioners and knowledge management, you know, maybe a bit outside of what they're used to. Right. What is your advice? You know, your sort of practical considerations, best practices, whatever you think, to somebody who's a data leader or data engineer or, you know, an analytics engineer, how could they be leveraging, you know, knowledge management a little bit more in what they're doing?

Andrea Gioia [00:28:53]:
Yes, I think couple of things, one practical and one more positional. Let me say positional is that engineer person, especially one that lead the team of that engineering in the consultancy or inside a specific customer. For me it's important that I have to understand that you must open to different disciplines. So must be very interdisciplinary in the approach. So it's not anymore about the data, how good you are able to manage a team of engineers. But you have. You need. It's not a cross, just having a cross functional team. But you must be cross functional. You must be a T shaped professional that knows something about organizational design, economics, sociology. You. You have to be curious because all the problem is not a data problem, is not the. The data management is not just a data problem. It's problem that is related to people, is related to the organization, is related even to the biology and system thinking cybernetics. So you have to be curious and study a little bit of all this field and try to analyze the problem from different angles. Of course you are a T shaped professional, so you come from the data and from the technical stuff. But you need to enlarge your vision on different things and then to start to do knowledge management. I think that the idea is that we already do a lot of knowledge related discussion. Just we do not write down the model. Every time we have a business case. The business is in the room because ever to explain the needs and translate the needs in requirement. So when there is a problem, it's the best moment to try to say okay, we can slow a little here to go faster later. So if you are telling me that the needs and the requirement why we cannot try to model it? Because we can create a better solution now and then the model remain, we can reuse the model in the future. So not decoupling the modeling from the problems that the business have, but piggybacking the modeling with the business analysis and the creation of solution. And so working in a value driven approach, in an incremental approach to knowledge modeling driven by use cases. And this is a point, another point is very important to show the value as soon as possible because you are working with use cases. Then you have also to show that the model is not only useful to produce a better solution because we have made it explicit, but it's also helpful because can be reused in the future to connect already deployed implemented product. For me the aha moment for the business is when it came with a new business case. And we can say okay, there's no integration cost here. We can just put together these two products and start working on the new KPI that you need to implement.

Tim Gasper [00:32:10]:
Yeah, you don't have to reset all that foundation all over again or create a brand new pipeline. You say, oh yeah, we have 80% of the ingredients already. We just need to do this extra 20% which we will do in a repeatable way and then we can get there very fast.

Juan Sequeda [00:32:25]:
Which gets me to think like I think we should people listening, right? The data teams listening. We should challenge ourselves and I think as, even as individual contributors or as leaders, we should say, you know what, what if we can give ourselves some metric of KPI where our goal is to minimize the amount of code pipelines or whatever we do and at the same time increase and maximize the number of questions that are being answered. I mean, I think that that is a way where we're actually then incentivizing for the reuse and incentivizing for focusing on the particular business questions that needs to be answered. And if we do need to write new code and new stuff, then we've actually gone through the exercise to say, wow, it just doesn't exist. There's a new problem. Therefore, if I'm going to create something new, my incentive is to make sure that will be reused. Therefore I want to minimize whatever future new code or whatever. That's what I'm taking on here. And I think hopefully the challenge here to data leaders is like, and you want to go show value. Specifically I want to show is like, look how much more I am providing answers to you while minimizing the amount of time and cost around that. So I think we, I mean concrete tactical takeaways like that's what I'm taking on here. Like that's what I want.

Andrea Gioia [00:33:53]:
Yeah, absolutely. Consider that one metric that I suggested to the data team in that I've already adopted from some time. The, the concept of data product is not related to the use of the data product but on the reuse of the data product. So how many data products are used are reused for multiple use cases and especially how many data products are reused in combination with data products that are outside their original domain. These are metrics that are very important because if all the data products are combined weaving the domain and not cross domain, this is a problem. If all the data products are used by only one use cases, only one use case. That means that you are implementing data product with an engineer to order approach. You ask me something and implement the data problem. Next time you ask me something else and create a new data product. Every time you have a need, I need to create a new data product. But we want to create a portfolio of data product that work as a building block and that can reuse and recombine over time. That's what I want to do. That's, that's the reason why I invest in doing data product and not just having a data set or a monolithic data warehouse, whatever.

Tim Gasper [00:35:07]:
I actually don't think that this is a well known truth yet. Like, I think that probably for the three of us it's pretty, it's pretty obvious. Like of course this makes sense, right? But I think there are a lot of people out there that are probably newer to whether it's newer to data mesh or newer to data product management. And they say, oh, I need to make data products, right? And they think that the strategy is that if I implement data products, things become better. And I don't want to poo poo on data products. Like I actually think that there's a lot of good that comes out of having more repeatable kind of interfaces that you can defend that creates more of a data contract, whether literal or figurative with users and things like that. So there's a lot about data products that's good even on their own. But not all data products are good data products, right? I think there's a spectrum from like okay, we have a data product, but how different is it actually from the old way of doing things? Maybe not that much different versus I have not only a heavily used data product, but it is being reused. It's being reused across different domains. And what is the minimum, what is the smallest surface area of data products that I need to have in order to serve the organization? If my Data organization has 400 data products to serve the organization or 40 data products and those 40 data products can serve the whole organization, then the 40 is better than the 400. Because I found a way to create more reusability.

Andrea Gioia [00:36:33]:
Yeah, absolutely. Consider that in fact I really skeptical when it comes to the discussion about the value. Because a lot of people talk about value as an absolute thing. No, we have to create value for the customer. But who is the customer? And what are the kind of value are you producing? I mean, if I try, we are turning the systematic problem that we have talked before. If I always try to create the best value for the customer, basically I'm trying to optimize locally. So come a customer have a needs and try to implement whatever it takes more fast possible solution adopt for the specific user. But I in doing that I satisfy the customer, I have created value. Probably the product is a great product because I Have an epic customer. So it's a great product. But in doing that I've created technical debt. I've created a lot of negative externality to the company. So the value for the specific user that ever needs should be balanced with the value for the system. No. And a good data product, is that a product that is designed because there is a specific use case? I don't want to do stock to order. So create data product before having use cases because maybe in the future someone we are going to use it. That's not what I don't want to create. A stock of data is a liability. It's not used. So I want to have a use case that provide the resources to implement the data product. But every time I want to also manage this data product considering that it should be reused in the future, it should have a life after that. Specific use cases. So do that little extra effort to not create negative externality and to increase the potential value of that asset managed. Because I can recombine and reconfigure it over time in the future without reimplementing it. Because a lot of the integration that now company are implementing are on the same data that has been integrated in other way for other use cases. The same data, the same data of the customer integrating in different way. Because I have different use cases that I. We are not able to put in common some part, some common part of the integration within a product that can be reused then for, for, for all the other use cases.

Juan Sequeda [00:39:01]:
What I love about this discussion right now again it goes back to the main takeaway it in which is reuse of data. And I'll. And this is making me realize I also need to correct kind of correct my, my own talk track of like oh, invest in, I mean yes, invest in semantics and all of that. But like no, what we should be really striving for the, the value message we should be striving here is reuse. Because when we're going to reuse we're going to save time. We're going to be able to go do things faster, right? We're able to go focus on, on the money strategic things. How are, what, what are the, what are the elements that you're gonna go through? Use the semantics and the knowledge and all that stuff comes in right? And I think so this really goes back to like driving these incentives that we need to have. And I think I, I'm gonna be, I'm gonna start pushing this message now is like we really need to have the incentives to set up the Incentives of reusing. And you can. And this is why I always say that that we need to. We need to make sure that one plus one is greater than two. And an economic argument of reuse is economies of scale around these things. Right? And yes there may be when something we need to do balance is like we may need to. If I'm investing and doing that extra little work it's time and energy that I could have put into that next. Next thing I was going to do right now. But we also need to real. But we need to have that balance. Right. So I think takeaway reuse, reuse, reuse. I mean think about the T shirts, right? I have a first T shirt that I hear is create the model of the problem.

Andrea Gioia [00:40:32]:
I I usually 2 point reuse to increment the value of the asset in the future is reuse and integrity composability with other data assets. So it's also very important not just to reuse that asset but have the capability to compose that asset with other assets.

Juan Sequeda [00:40:48]:
But that would mean reuse too.

Andrea Gioia [00:40:50]:
Yes, but it's not reused by just itself. So I have. It's very simple also to add the customer that the product will sense that a product without paying integration cost. Because I can't compose they are integrable without doing integration.

Juan Sequeda [00:41:06]:
I would say it's reuse reuse by in a way eliminating or eliminate and reducing integration costs.

Andrea Gioia [00:41:16]:
I think that is a good article by MIT Research center that called the sum of reusability and composability the data liquidity factor. So data is liquid asset if is reusable and composite. This is that definition. But I like it.

Tim Gasper [00:41:39]:
So I like that too. Two additional topics that I think we should make sure we hit before we go to our lightning round today. One is around the challenges to shifting to some of this approach, right. And then the other is, you know, a topic that we love to talk about these days, AI So we should make sure we talk about that too. But first talking about the challenges, right? Two big areas of challenges that I see around taking this focus and embracing this focus around reuse and composability is sort of the culture of the organization is one and then the other is technology. And I'm curious about your takes on both and I'll explain each of them. So culture like I'm thinking about how a lot of organizations are just reacting a lot, right? Like oh, we need this report, need this report, gotta create this report, right? And so they're reacting a lot and so therefore they're not Necessarily thinking about tech debt or reusability as often as they should. Maybe they talk about it sometimes. Right. So how do we break through that? What is like how do we demonstrate the roi? Is it cost savings? Right. Is it agility? And then on the technology side. Right. You know, I think one of the comments here actually said on LinkedIn was really good. It said, you know, there are many tools on the market about data management. I've yet seen any platforms to tie knowledge management and data and governance together. Right. And so it is true that most data practitioners today, it's very easy to grab a database, to grab dbt, to grab an open source BI tool or grab even a low cost cloud BI tool. You can grab these ingredients and you start to stitch them together. But none of them inherently are knowledge management tools. They're data tools, Kafka et cetera. How do we solve the cultural problem? How do we solve the technological problem? Do you agree that both of those are problems? Problems?

Andrea Gioia [00:43:29]:
Yeah, it's a social technical problem. No, it's quite, it's quite popular nowadays. But I absolutely agree it's a systematic problem relate and the system is the organization as a whole. So the organization is composed by people and technology and these two stuff are related. I think that from a organizational point, the kind of the tone of the discussion that I have with IT and business in my consultancy work is a little bit different. So for the business the important to get there buy in and having them at the table to start design the conceptual model of the domain. I say that this is to be more fast in answer to the question. So it's not something related to rationalization, better managing the complexity, city governor and all the state. There is not something that they are interested in. They have to especially to the business line that are really closer to the market, to the boundary of the company. They need to be very fast to provide the hyper personalization of the service and they need to have a doc solution and produce a doc solution very very fast. So what I'm telling them is that consider of having a portfolio of building blocks block that you are free to recombine as you want and you can go very, very fast. So this thing is designed not to control cost, to regularize complexity and whatever. This is the kind of talk I have with the IT people, but with the business, I say you are slowing a little bit down to do this model because this will make you very more fast in the near future. As soon as you have the model, we start to build this building block and these Building blocks are not siloed application, but are really building block that like Lego you can combine to create your solution without relaying to the it. So you are in economy of differentiation. You have to go in economy of speed. You have to go very, very fast. And the reality is now that the data management is centralized on the it, they work with an economy of scale. So IT work to control the access to the resource, reduce the cost of a central contour. You are going very, very slow. But you cannot remove the it because if you started to do everything without any rule, it became a chaos. So what is the compromise? Just create these, manage this resource in a common space between the IT economy of scale and your economy of differentiation. Create a common space in which we can manage this stuff the data together in order to make it reusable and go very fast. About the technology. I think that is of course technology can help, but is last of my point in the list because of course there is a problem of how you formalize the knowledge as soon as you have agree on the model. But it's very technical problem. It's something that. Something that just kept someone that just came out from the university can maybe not so simple. Creating a good formalization of the model is an art. But it's a technical. It's very technical problem. The difficult thing is to make agree the VIP of marketing and sales on what is the customer. We can do that even on a board, on a whiteboard. When they reach this agreement, then translating it in JSON, YAML, rdf, whatever. It's not a problem, it's technical. The business people can go out of the room and we can have some technical guy that draw the final model. But the big discussion is for us, the patient, considering healthcare. The patient is a customer. What kind of customer is the patient? And the insurance company that pay for the patient is actually a customer. Are these kind of discussions that are really important and create the value. Then if I present it with a document, with an rdf, of course I have preferences because I want. And we come up to the final argument, artificial intelligence. I would like to formalize the model which we have agreed upon in a way that is machine readable and can be filled to artificial intelligence agent. Of course. So I prefer to have an RDF and ontology model, an ontological model, than just a document or a bpmn.

Tim Gasper [00:48:08]:
Right at the point that you're capturing it. Why not capture it? Capture that knowledge in the most reusable way as well.

Juan Sequeda [00:48:14]:
Yeah, all right. This. This has been such A phenomenal conversation. And and I'll tell you that I've been texting while we talk to to Aura Aura Lucilla and I co author Menorah is also former guest but also or I wrote the first specification of rdf back in 1998 and the fathers of the Semantic Web and I told him like oh, you gotta listen to this episode. And I said oh, talk about reusing composability. And he brought, he texted me back and he says and it's also extensibility. I want to design X so that can. It can be easy to extend and good extensibility promotes reuse because people can make extended X to be exactly what they know need. So I think to add this, it's reuse composability and extensibility.

Andrea Gioia [00:49:04]:
Absolutely, I agree.

Juan Sequeda [00:49:08]:
All right man, we need, we need. There's more stuff we didn't even get into the whole data products discussion. When I get to. And all the standards and all that stuff, I think that's another. We'll have to have another completely episode on that.

Tim Gasper [00:49:19]:
But follow up.

Juan Sequeda [00:49:21]:
All right, Lightning round question. We didn't even get to my. One of my favorite topics which is the lightning round question is number one, do you need to implement a knowledge graph in order to achieve better knowledge management?

Andrea Gioia [00:49:36]:
Not absolutely no. You need to manage knowledge then the knowledge graph is a way to represent it in a machine readable way. So you can do IE stuff. But no, the most important thing is to agree on. Com suit.

Juan Sequeda [00:49:49]:
I agree with you. I do will say that the knowledge graph is the best way of managing that, but it's not needed.

Tim Gasper [00:49:56]:
Second question we mentioned about data mesh today in our episode and that just because you implement a data mesh or data products it doesn't solve the broader knowledge management problem. But does it help? Does implementing a data mesh and data products, does it help you on your knowledge management journey?

Andrea Gioia [00:50:18]:
Yes, I think that it's very important because as much as you decompose your architectural model, the more you need to have something to put this module back together again. So to make sense of the different module and understand how the module composes system that can be composed and configured over time. So the knowledge is even more important because if you think in a monolithic data warehouse, for example there is a really clear model of everything is the single source of two. So even if you do not create your ontological model, basically it's easy to infer the ontological model from the data warehouse that contain all the data. But as soon as I have a lot of pieces around developed maybe by different domain. It's very important to have this view, this ontological view that keep all together.

Juan Sequeda [00:51:11]:
All right, next question. I talk a lot about this role, right? Let it be the literal role or a hat where people need to wear the skills around. The knowledge scientist, the knowledge engineer. Do we need more of these knowledge engineers or.

Andrea Gioia [00:51:30]:
Yes, I think that are important. They are a facilitator for me. They are enabler to put the business together and facilitate the discussion to arrive on an agreement and make them understand why this is important and what kind of agreement we want to fight when we modeling something. Because we do not want to model the organization. We want to create a model of the organization to a specific goal. So it's very important is an interdisciplinary person that is able to formalize the model, but is also able to talk to the person, is a negotiator. It's that it's the person that we have talked about before. Interdisciplinary person that have a multiple skill and can play different role and facilitate this process of knowledge management. Yes, it's very important. Something someone that does not exist yet. But I think that the different professionals can evolve to cover these.

Juan Sequeda [00:52:26]:
Well, bonus question here. Will. Will we see data engineers? Will this be rolling into data engineering or will data engineers turn into this or I mean two different roles or this more?

Andrea Gioia [00:52:39]:
I think that you need to have so much different competencies that we can have data engineer like me for example, I'm a data engineer that I'm progressing toward. I hope to achieve this role, not to have this capability. So I come from data and then learning how to manage knowledge and learning organizational aspect, economical aspects. So I'm learning what. But my background is in data. But I think that also person that comes from the business, maybe in the demand part of the business can translate in this role. I think that of course person that originally come from the knowledge management can transition in this role. So different person from different point can arrive there, but no one is still yet there.

Juan Sequeda [00:53:26]:
Okay, so the second way I'm having is if you're trying to get into data and you're coming from the business side, consider knowledge engineering is something. But we. But.

Andrea Gioia [00:53:36]:
All right. Absolutely, absolutely.

Tim Gasper [00:53:43]:
Final question, final question. We ran out of time to talk about AI. But I'll. I'll work it in. Anyways, for our fourth lighting around question which is okay, AI right now, both of these are probably true, but which do you see as being the bigger value AI to help Us to capture and manage knowledge or knowledge being fed into AI to try to make it.

Andrea Gioia [00:54:03]:
The second one, because knowledge is something that is in your mind, not in a generative a system and of course can help in the process, but more in the later phases of the process in the formalization, let me say, but not in externalization, not in discussing to have an agreement. And because this is something that only the people within the company, so only the organization can formalize, this knowledge is the very value added asset for the company because the model, the foundational model will change in the future, will become more and more powerful. One thing that this new model, the model that we came up in the next generation won't for sure have is your knowledge. Your knowledge is your own. Only you have that specific knowledge. And so you always have to feed this knowledge into the model to make the model operating in your domain. So the more the company is ready to have this knowledge model, the best will today in the, the best can today leverage the existing model, but also in the future will be better positionated to leverage the future model.

Tim Gasper [00:55:22]:
Yeah, I think I agree with you. There's some startups and other organizations that are testing this theory though, right? They're kind of throwing all the documents and transcripts from conversations at the LLM and they're saying, okay, LLM, can you just, just derive knowledge from this? But you know, I suspect that there is something very human about knowledge creation and knowledge formalization that can be assisted by AI, but, but may not be replaced by it. But I guess we'll see.

Andrea Gioia [00:55:52]:
Yeah.

Juan Sequeda [00:55:53]:
All right. We have so many, so many notes and just kind of the, just what happens in the back end is that Tim and I always taking notes. But today Tim did a lot of the note taking because you got me.

Tim Gasper [00:56:08]:
Juan, you're making me blush.

Andrea Gioia [00:56:11]:
I am always surprised of how well you are able to summarize everything that has been said.

Juan Sequeda [00:56:18]:
Let's see, let's see, let's see how we do today. All right, Tim, take us away.

Tim Gasper [00:56:24]:
We started with the honest, no BS question. Why do we need to focus on knowledge management now? Right now. And you know, Andrea, you said that we are in the information era and you know, this also connects to knowledge. And the companies that are the most competitive are the ones who are a learning organization where they are collecting and formalizing and activating that knowledge. And you mentioned that example. Consider all the meetings that you're in every day and it's just to get synchronized, it's to synchronize our mental model. But you know that if you can formalize those conversations and formalize that knowledge and turn it into an asset for the organization, become a learning organization, then you are going to create multiples of value for the organization and for the impact that you're having. You said that you have to create a model of the problem and you need to take the implicit and you need to make it explicit. And that's one of the fundamental things that kind of makes knowledge management a little bit different than data management, is that you said knowledge management is around how the org manages the knowledge lifecycle. So the creation, socialization, formalization and application of knowledge and then the operating model to actually sort of govern and support that. Whereas data management is more literally the base of the architecture. It's the bits that are moving. But then you need to put those data pieces in context and you need to layer the semantics and the management and the governance and the knowledge on top of that. And that's how it combines together. I know people always talk about kind of like data, information, knowledge, wisdom, and I know that's a little hokey, but also there's a truth to that. There's a reason why people kind of think about that hierarchy. Right. You talked about your background and, and how you actually came from the data world and you experienced these problems firsthand when you were trying to implement sort of a data product, data mesh type approach. But even though you were implementing these data products, it wasn't actually solving the problem of the complexity for the organization. You still weren't able to reuse it across domains until you really took a knowledge and semantics orientation. Right. And that was what allowed you to really kind of help. So, you know, reuse is key. It's not just about the use of those data products and it's not just about gen AI. It's not just about cost efficiency. It's about how do we really create economies of scale and so much more. But Juan, over to you, what were your big takeaways?

Juan Sequeda [00:59:05]:
So how do we start doing this? Right, Like I said, let's go define your model. Right, Right. And it needs to contain the basics out of the main concepts like customer and that stuff, and just the stuff that we kind of agree upon at a high level. And yeah, let each domain can extend to their own definition. And to do this, it's not just about technical people. You need to have these conversations with the business. And a lot. We talked about that. The business still lives in this last millennia, right. They focus on optimizing their activity in their department and not really focusing the general part of the system organization. So there's this lack of incentive in this collaboration in the system and I think the traditional model optimism of optimizing for the part and not for the system's needs that needs to be challenged. What can data people do to bring more knowledge management best practices? It's a, there's a positional point which is it's not just about having cross functional teams. You must have cross functional knowledge, understand the business, understand the economics. Because at the end of the day data management is not really just a data problem. It's a problem of people, strategy, systems thinking, econom, economics, incentives. And so you got to really be kind of very T shaped on this. From a practical point of view, you already do all this knowledge related discussions, but then we don't write it down. Right. So the aha moment is when you have a new business case comes in and you realize, oh, I don't have to redo the work, actually all of it is already done or 80% of the ingredients are done, like perfect. That's what we should be striving for. How many of the data products are being reused and not are just being used across different domains? If we're only, if we're only, if we're only focusing, creating value for that particular customer that we have, we're always focusing on that local optima and that's creating some tech debt.

Andrea Gioia [01:00:40]:
Right.

Juan Sequeda [01:00:40]:
You have. So the need for the user should be balanced with the needs for the system. That's why it's about reuse composability, which is eliminating, reducing, avoiding integration costs. And then this other discussion we had about extensibility and tackling that cultural challenge is about look, we need to explain that these building blocks aren't just about saving money. It's about actually moving faster. Faster, which it acknowledged that it's actually moving slower in the immediate term, but it's faster moving soon. Right. So and ultimately this will save money, improve efficiency, improve innovation and make sure that we have that common space to manage these data assets and pipelines and all that knowledge together. And once the bill, the business people define something, they're going to get out of the room. And it's our time that now we can capture the technical part, but the definition is the hard part. And we just really appreciate the value of having those conversations and prioritize those conversations. And kind of the little AI angle here is like, hey, if we're going to go work with AI, rather give it something formal and machine readable. Instead of just something in documented text.

Andrea Gioia [01:01:39]:
As usual. Perfect. It's perfect. I don't know how you can write down all these notes and summarize so good all the conversation, but it's perfect. Nothing to add, nothing to, you know.

Juan Sequeda [01:01:50]:
While we're having a drink.

Andrea Gioia [01:01:52]:
Exactly.

Tim Gasper [01:01:53]:
And it's all you. So thank you for the amazing, amazing content.

Juan Sequeda [01:01:57]:
This is your content. So this is.

Andrea Gioia [01:01:58]:
Thank you.

Tim Gasper [01:01:59]:
Thank you.

Juan Sequeda [01:01:59]:
Thank you so much. All right, we'll wrap it up with three questions advice. Who should we invite next and what resources do you follow?

Juan Sequeda [01:02:10]:
What's your advice about data knowledge?

Andrea Gioia [01:02:14]:
My. My life. Have fun. Don't be too serious. Be outright and enjoy the life about data. As I said, try to be curious and your grow path within the field. Try to look discipline that are aside the practice of data management as a technical thinker. Look at system thinking, look at economics, look at something at sociology if you have some specific colleague or customer. Look also at neurology or pathological neurology. Depends. But be curious. This is also fun for me. It's a multidisciplinary field. It's very fun to read an economic paper and then an organizational paper. It's something that you must find fun and you must invest on because it's an integral part of the practice. So this is my suggestion. Second question was who should we invite next? It's difficult because you have already invited a long list of people that I follow. But I think JB Bloom I like it. It's thinking it's write a PhD on economics, the theory of the tree economics that is very related to data management and how data can be managed in an organization and it provide an economical model for doing that. It is recommending so managing common particular resource like data. It's a very interesting theory. You can find it online and I think it's. It will be a very nice discussion. And so because I think that we need that to reframe as much as possible the data management knowledge management problem in economic terms. We need to see the economics aspect that are behind this. We need to build an economic theory of data management to explain that to the business and to the field Here last question.

Juan Sequeda [01:04:13]:
What resources do you follow? People, conferences, newsletters, podcast, whatever.

Andrea Gioia [01:04:19]:
So I see that my memory is terrible so I never will be able like you to do a summarization of a one hour talk because I cannot even remember three questions. But it's okay. I don't know. I follow a lot a lot of different. Different people in the last period. I like very Much the Boundaryless podcast that is a podcast on organizational management and they have always guest specialized in organizational design system thinking. So I like it very much. And I suggest also to read read a lot book paper, see conference event recording. It's very useful. And if I have to suggest one resource that is a must for me for data management people, but in general but for data management people is to read something on system thinking. I think that a simple book to read to start part is Thinking in System of Donatella Meadows. It's very simple but put the introduction to what is system thinking? If you want to go further. I think that the masterpiece in the field is the fifth discipline of Peter Sange. This is about learning organization. What is a learning organization? What is the relation between learning organization and system thinking? So I think that a little bit of system thinking is. Is important for data management and for management in general.

Juan Sequeda [01:06:00]:
That's on my start. We started off this with I need to focus on more reading and thinking systems has been on my list. So Andrea, thank you so much. I am so ex. So thankful we had this conversation because personally you helped me kind of organize a lot of the thoughts I've had. I think you get them crisper. So I'm very, very lucky.

Andrea Gioia [01:06:20]:
You do the same for me. So it's. But it's also important to go outside of the bubble. We are in our bubble but we exchange a lot of information that reframe our problem and it's. It's nice to. To be. To be there. Yeah.

Tim Gasper [01:06:36]:
And we find the path forward that we need to walk upon. And. And I know we got some comments here from our listeners who were very happy with the conversation today. So thanks to everybody who listened and Andrea, cheers. Cheers. Excellent to have you.

Andrea Gioia [01:06:50]:
Thank you. Thank you very much. Till next time, bye.

Special guests

Avatar of Andrea Gioia
Andrea Gioia CTO at Quantyca & Co-founder at Blindata
chat with archie icon