About this episode

What’s the old saying? “The journey IS the destination.”

What started as an experiment and a way to kick back with colleagues and peers on a video call, turned into a thriving, honest, no BS podcast about enterprise data management. 

Hosts Juan and Tim embarked on this journey that turned into a 50-episode series. The episodes boasted conversation topics spanning from identity graphs, modern data stacks, and building data teams, all the way to data lineage, data trust issues and learning what a CDO actually does. Our audience heard vulnerable and transparent talks from leaders at companies like AirBnB, McKinsey and Company, Wunderman Thompson and more.

 

Join in this season finale for a look back at season one and a candid talk about the lessons learned.

 

This episode features
 

  • Season Takeaways
  • Best Moments

Transcript

Speaker 1:
This is Catalog and Cocktails. Don’t forget to subscribe, rate and review wherever you listen to your podcast. Here’s your hosts, Juan Sequeda and Tim Gasper.

Tim Gasper:
Hello everyone. Welcome. It is Wednesday, and it’s time for Catalog and Cocktails, your honest, no BS, non-salesy conversation about enterprise data management with tasty beverages in hand, once again live from data.worldhq. I’m Tim Gasper, longtime data nerd and product guy at data.world, joined by Juan.

Juan Sequeda:
I’m Juan Sequeda, principal scientist here at data.world. It is as always a pleasure to spend Wednesday afternoon, middle of the week, end of the day to go chat about data. It is an interesting day today because it is, first of all, episode 50. It is our season finale. It’s a bittersweet that we’re taking a quick pause about for Catalog and Cocktails, but we’ll definitely be back because there’s so much stuff we want to go-

Tim Gasper:
Little summer break here, but we’ve got exciting things planned for season two. Don’t expect that we’re going to be gone for long.

Juan Sequeda:
No, no, no. What are we drinking? What are we toasting tonight?

Tim Gasper:
Well, first of all, to 50 episodes.

Juan Sequeda:
Cheers, 50 episodes.

Tim Gasper:
Cheers.

Juan Sequeda:
50 episodes.

Tim Gasper:
You have some special things going on, right?

Juan Sequeda:
Well, actually, I’m cheering because tomorrow is my two-year anniversary here at data.world. I joined when data.world acquired my previous company, Capsenta, and we’ve been bringing in all this knowledge graph technology. Now, it’s super exciting to go see how we’re just taking off the rest of the world. It’s been two years. I can’t believe. Well, last year, it’s been at home, and now finally, little by little getting back together, but this is super cool. Cheers for 50 episodes of Catalog and Cocktails.

Tim Gasper:
Cheers. What started off as a pandemic exercise here in a different medium, a different approach has really turned into something, so appreciate you all for being our listeners and being a part of this. We got drinks and cocktails today, right?

Juan Sequeda:
We’re drinking. We have our… Now that we’re coming back to our office, we have our master mixologist.

Tim Gasper:
Dave Griffith.

Juan Sequeda:
Dave Griffith. He made us a liquidity event, which is… Remember?

Tim Gasper:
Gin, elderflower, cherry liqueur and lime.

Juan Sequeda:
Delicious. Cheers to today [crosstalk 00:02:34].

Tim Gasper:
Thanks, Dave.

Juan Sequeda:
Thanks, Dave. We thought, well [crosstalk 00:02:40], that we have… At the beginning of the year, we decided to bring on guests. The first part of the season, it was just you and me. We had a couple of guests, and then we said, “Hey, how about we bring in more people just to have so many different conversations?” I think it’s just been a huge boom. What we’ve done is that we’ve summarized and categorized the different topics the last, I think, at least 20 plus episodes that we’ve just been discussing stuff. We just want to give a summary or our takeaways, and then give the takeaways of the takeaways of the takeaways of the last-

Tim Gasper:
This is the recap episode with color commentary. It’s like the sport edition of what’s happened over the last… I don’t know. How many episodes?

Juan Sequeda:
A lot.

Tim Gasper:
30 or so?

Juan Sequeda:
Probably. All right, so the way we did this is that we started to guess how we defined this, right, by people, processes and technologies.

Tim Gasper:
Well, it’s either that or don’t boil the oceans.

Juan Sequeda:
Don’t boil the ocean shows up in every one, I think.

Tim Gasper:
That’s the theme.

Juan Sequeda:
That’s the theme. All right, let’s talk about people first. I think one of the first things to tell the story is we’re talking about people. We’re talking about team, and the question is, “How do you start a team?” Episode 41 was with Patrick Barry about building great data teams. On this episode, it was great, because Patrick was actually just starting a new job, and he was starting a new data team. We got into this… This was perfect timing for him. The first thing they started out with is, “Hey, let’s understand the lay of the land. Let’s get to know people. Let’s get to understand the existing workflows.”

Juan Sequeda:
Don’t start by kicking down doors and criticizing and bringing in your own way. It was really important to understand how things are set. I think that’s really… If you’re starting a team from scratch in a way, you’d understand the lay of the land.

Tim Gasper:
Exactly. I think that was a really great session because we got a really nice overview of the different personas, the different folks that are involved, and really thinking about, “How do you create success?” I think one of my favorite lines that came out of that was, “Don’t be an asshole, right? Create a safe environment. Attitude is infectious.” You want to be somewhere where you feel like everyone’s got each other’s backs, and that really applies to the data team as well.

Juan Sequeda:
What’s also important there is that you want to be able to understand the needs of your consumers, of your clients, of your data. Who are those clients? Who are those consumers, and what do they actually need? That’s always important to… When we talk about… We got to define success, but remember that success is tightly connected to who’s actually consuming the data. Who are those folks, and what do they need? Also, it’s important to be able to understand the skill sets that we have in house, and which ones do we need to bring in externally? In addition to that, it’s the technology stack.

Juan Sequeda:
This was a very interesting aspect I never thought about is that it depends on which domain you are. There’s different type of technology stacks that you may want. For example, marketing, if you’re marketing, hey, people need to know the different technologies like Google Analytics, for example, or LinkedIn ads and so forth. Who are the people who have these expertise and these types of tech stacks or these different applications within your domain? That’s really, really important. I think this is something that understanding the domain and who’s involved in the technology there is crucial.

Tim Gasper:
Sometimes, we’re tempted to… When we join an organization, you’re going into a management or a leadership position, or maybe you’re a member on a team, and y’all are just trying to grow your team to be thinking more from a technology perspective, right? Like, “Oh, we need Python people, or we need this kind of people,” versus thinking about like, “Well, what skills do we have, and what are the people that we have here, and do we have experts in certain areas that then can be mentors and be coaches to the people that we hire?” It’s good to think about it from a people first perspective, not just the technology first perspective.

Juan Sequeda:
There’s a lot to balance. You have to balance the people, balance the skill sets. Balance the tools. Balance your budget, right? For example, you may start with a lot of the free tools, and then revenue increases, then you can be able to go upgrade. I think one of the things there is that success needs to be tied to how you’re growing revenue and how you’re lowering costs. I think, at the end of the day, this is the takeaway… One of the big takeaways I’ve had over all our conversations is the bottom line success of any data project is, “How is this making us money for the company? How is this saving us money?”

Juan Sequeda:
We really need to go tie those things together, so when you’re starting with your team, that’s the stuff you want to go have. Think about that.

Tim Gasper:
The next episode was also related to people, also about teams and specifically how you scale, how you grow. That was with Meetesh Karia, the chief data officer of Zebra. The title of that was Data Organization: Reap what you Sow, because based on how you grow your organization, that’s going to have big effects. I’ll start off with the first point, which is that what was interesting is that he really came out with the perspective that silos aren’t bad, right? It’s not that silo is always bad, always need to break the silos.

Tim Gasper:
The reason why they’re created is for efficiency, right? When you create a silo, you’re creating a group of expertise combined with technology, combined with fast path towards a certain goal. Where silos become a problem is when you can’t scale beyond them, and when information isn’t being passed between them.

Juan Sequeda:
This episode, I remember we… This was a live brainstorming session because it was like, “How do you start growing your team, and when do you start to centralize? When do you start decentralize, or do you go to this centralization?”

Tim Gasper:
When do you make those moves?

Juan Sequeda:
This is a fascinating discussion here. I think one of the things that we have is when do you start to decentralize? Well, one is if you have a centralized team that doesn’t fully understand the business anymore. At that point, you probably need to go decentralize and go… We’re talking about pods. That was one thing. Another way to test there is check how long it takes to hire someone and for them to get trained. Every time it gets longer and longer, then your group is too big. You’d probably need to start decentralizing and breaking that. I thought that was a really, really interesting insight over there.

Juan Sequeda:
Part of that is because the analogy we’re doing is with software is that if you have a big monolithic codebase, you need to start modularizing that. Same thing within your data organization, your data teams, if you just have one gigantic team that does everything, it’s just like you want to have one gigantic monolithic codebase. You need to start modularizing that. You could probably apply some of that same techniques to decide on when you’re going to modularize.

Tim Gasper:
In addition to that, a concept that’s actually come up a few times in our episodes, but I think was especially accentuated with the one with Meetesh was around this idea of efficiency versus resiliency, and an efficient system, oftentimes, you associate that with agile and like, “Hey, let’s get from A to B as fast as possible, and iterate, iterate, iterate,” as one model. Another model is more like resilient systems. When somebody leaves the company, can we be resilient to that? When a particular technology becomes extinct or defunct, can we be resilient to that?

Tim Gasper:
As we scale, can we be resilient to the issues that come with scaling? We explored a lot of the ideas around, “Okay, how do we balance that?” Maybe when you’re a smaller company, you start with more of a focus on efficiency, but then you start to hit some walls, and you’re scaling, and you’re like, “Okay, now we need to scale out. We need more data teams. We need more pods. We need to push things to the business or vice versa, right?”

Juan Sequeda:
One of the things, I’m a big aviation geek. I love how we were able to do this analogy with legacy airlines versus Southwest. Southwest is an airline that they’re just completely centralized. They go from point to point, while those legacy carriers, they have their hubs, and they have their spokes. That’s an interesting analogy. It’s like if you start off in a completely decentralized organization, it’s like you have… Well, you’re like Southwest. Everybody go talks to everybody, but what’s interesting is that there is still some stuff that is… Let’s call it centralized or more standardized, so going back to the Southwest analogy is they only fly 737s.

Juan Sequeda:
That’s how they’re tenderizing it. Look, we all do different things. Every team is decentralized, but we’re standardizing on the tools, on the processes that we have, or you can start thinking about having a core team, and this is your hub, and you can have different parts of your spokes that talk to your core team. Then you decide what’s core, and then your each pod, which is probably based on every domain of what they’re doing. You’re going to have data product managers who are going to be liaisons between the different teams.

Juan Sequeda:
There’s no right or wrong way of doing this. I think that’s the point is… I think another takeaway I’m having is it depends on the culture of your organization. If you’re more a decentralized organization, or if you’re a more controlling organization, you want to figure out what is the balance that works best for you. In addition to that is you should understand what that is for your organization, and define a template, because you need to be able to grow your organization based on that template.

Juan Sequeda:
Are you a very centralized? Are you all decentralized? What is that hub and spoke type of model that you want to have? Figure that out. Make that template, so you know, “Oh, look, we need to just go create new hubs, or we create another a hub, another spoke and so forth.”

Tim Gasper:
We actually ended that conversation with an interesting possibility for the future and then an interesting quantitative insight. The possibility for the future was as we talked about the scaling process starting with centralized and then building out your hubs and expanding that there’s a roadmap there. It would almost be very interesting to visualize, “Hey, when do you hit those tipping points when you need to move to the next scale factor?” Then the interesting quantitative thing… I love when numbers get involved in things.

Tim Gasper:
I’m a big fan of the Pareto principle, 80-20 rule, right? Meetesh talked about threes and 10s, and that your company as you’re growing, three people, then 10 people, 30 people then 100 people, that threes and 10s become these points where systems start falling apart, processes start falling apart, and you need to refactor.

Juan Sequeda:
That was an amazing insight I have. I think that’s something I will never forget from now on, the present tense. Now, when organization starts getting bigger, you want to have a chief data officer. Our episode with Mohammed Aaser, the chief data officer of McKinsey, was phenomenal. I think it’s one of the most listened, most played episodes. I love his definition of a chief data officer is a data entrepreneur. Very simple, your honest, no BS answer right there. There are these three things you can decide what type of chief data officer you’re going to be.

Juan Sequeda:
You’re one who focuses on innovation. You want to go do new things with data, or you wanting to focus more on the architecture. You’re bringing in, making sure you’re defining in the right architecture, and bringing the right tools, or you’re more about data enablement. Your goal is to go build culture within your organization to go bring in more data. I think that those are the three types of CDOs that you are, but at the end of the day, you are a data entrepreneur.

Tim Gasper:
Right. It seems like one of the biggest areas that Mohammed focused on was thinking about who are the consumers of the data? Who are the folks that really are going to benefit from data in your organization, and take both a use case and a persona-based approach to think about who are these consumers? What do they really need to do with that data, and how do we build momentum in the entire organization, visibility and momentum, so that we’re creating this motion towards empowering and enabling these different personas through culture, through platforms, through technologies, and so on and so forth?

Juan Sequeda:
Part of building that data culture, what is key is building relationships. If you’re realizing who are the teams, who are the people who are actually looking for data asking questions, go meet them. They’re the ones who are excited about data. They’re the ones who are going to be the head of community within your organization. At the end of the day, the CDO is just one person. They have to identify those data champions across the organization, and they’re the first ones to build a map of those data experts.

Juan Sequeda:
At the end of the day, you don’t come to me for the data. I know who are the right people who are going to set up the right pods in a way about data within their domain. I think that was crucial.

Tim Gasper:
The CDO is the guy at the top of the pyramid of the multi-level marketing, where he has to get everybody to find their friends, and get their friends to find their friends. In the end, everybody benefits. It’s like that, but for data. Let’s sit on that one for a little bit.

Juan Sequeda:
Mohammed is a person who has a lot of visibility within all types of companies and everything that’s going out there. I really appreciate his insight of what’s next. There are three aspects. One is data as a product and knowledge graphs. I think that was one of the things that… What’s next is that we’re going to start treating data as a product. The way to go do that is using knowledge graphs, which we’re going to talk about that in a sec. Second is external data. We’re going to see…

Juan Sequeda:
We already know that there’s so much data out there that we’re going to start seeing these roles such as data scouters, data hunters, and they’re the ones who are going to be helping finding data, but they’re going to be tied directly to the business to help them solve problems. Third, something I really loved here was humanizing data. What is the user experience of data? I think this goes connected back to what is a data product is we need to make sure that we’re creating data that other people can go use, and they’re going to enjoy, all right? Who is the Marie Kondo of data? Does this data bring me joy?

Tim Gasper:
Right. Does this spark joy?

Juan Sequeda:
There we go. Does this spark joy?

Tim Gasper:
Spark joy. The one thing that this reminds me of is that one of the themes that we’ve been noticing across this entire set of episodes are these parallels, these analogies of software and what has worked and been effective in the software world shifting to agile, user experience, paradigms and things like that, and how those things have then come to the data world and had positive impacts. Their data ops is another thing, right? We’re going to talk more about that in a few minutes, but this idea of humanizing data, the UX of data, there’s so much that’s been thought of around UX for software and having good design paradigms there, but not as much on the data side, and so there’s so much opportunity there.

Juan Sequeda:
This is a huge opportunity, and we’re not… I think the way we actually take advantage of this opportunity is bringing people outside of the technical sphere. I think that’s a really important thing is basically, who are your consumers of data? Who are the subject matter experts who actually need that data to go solve a problem, to go make money and save money? I think that’s a huge opportunity that we’re seeing right there. Now, to wrap up the people aspect, we also had a topic about education. We had Professor George Fletcher from the Technical University of Eindhoven talking about what’s going on in data science universities.

Juan Sequeda:
I think one of the ways he framed in his book, computer science is the study of computation, the study of algorithms, of methods, of processing things. Well, data science is really… The object of study is data. I think that’s a separation that right now we’re seeing data science being either in more of the statistics side. We’re seeing it more in multidisciplinary, but it will eventually become its own degree in a way, because your object of study there is data. I think that’s something that is really important for us to think about. It’s not just about the methods that we’re doing. It’s really study this thing called data.

Tim Gasper:
I thought that was very interesting. There was an analogy that he put together. It was like, computer science is to algorithms as data science is to data, and the idea that as data becomes more of this core object, as you’re noting, Juan, that that is going to start to become a more of a first class citizen along with the algorithms in terms of the way that the education system is put together. Are they going to call it data science? Is it going to be called something else? I know, obviously, we’re very interested in this idea of knowledge science that you talk a lot about, but that is going to continue to rise up, and universities, boot camps, all sorts of different organizations are going to continue to really proliferate and bring value to data in this way.

Juan Sequeda:
I think one of the things that we’re going to see more in data science curriculums is bringing in more of the philosophy of the mind, understanding different mental frameworks, sociology, anthropology, because these are the types of basically techniques and ways to go talk to other people that are needed. I think this is also going to give this connection to this area, which we’re calling knowledge science, knowledge engineering. There’s a lot of stuff that’s going to be changing here in the next five to 10 years, for sure, so it’s going to be really exciting to see what is the next career path around here.

Tim Gasper:
For folks very interested in the technology around data, and then folks listening to this who maybe are continuing to deepen their expertise around data, one of the things that I think George inspired through his talk was consider learning more about sociology or anthropology or some of their philosophy, these other fields, where they approach this like a framework, and give you a toolkit to understand people and why they operate the way they do, because ultimately, getting value from data is an exercise in understanding what people need and how they operate.

Juan Sequeda:
Let’s talk about processes. I think if we’re going to talk about process… How do we get started here? We had this episode with Ashleigh Faith, who is from EBSCO. We’re talking about therapy, your data therapy sessions. I found this really, really interesting because as you want to go through therapy, there’s somebody who’s asking questions, right? They’re trying to understand what is the fear, the danger you have to understand what actually needs to be solved. I think that’s one of the things that you want to go think about how to start is, “Hey, survival and danger are probably triggers, and they’re attention grabbers, right?”

Juan Sequeda:
They may break the ice, and then from there, you can transition into things that are going to be enhancements or enablement. I think that maybe it’s one of the things that you don’t expect to do initially is let me go talk about all your fears, but hey, if figure out what you’re fearing, that’s going to be a way to start later on doing something more in the positive. I think somebody wrote it already in the chat is, “Hey, one of the phrases that we’ve said before is brakes on the cars.” They’re there to make you go slow. Well, that’s the fear part, right? But hey, it’s actually there to enable me to go faster safely. That’s enablement.

Tim Gasper:
Well, a lot around risk and compliance is obviously regulations, but a lot of it is also emotions and concerns and fears and all these things, right? In general, when people are involved, and people are absolutely involved in data, we just talked about that for the last 20 minutes here, that you have to deal with people issues and people concerns. In that situation where you’re dealing with all these concerns, who’s the therapist? Is that the data scientist? Is that person the therapist?

Juan Sequeda:
I mean, that’s why we need to have this new roller or these new skills around what we’re calling knowledge scientists, right? I think the knowledge scientist, this is the person, the role, who’s going to be the middle between the consumers and the subject matter experts and the producers, people who know the data. They’re the ones who are going to go talk to people, and can even do some ethnographic studies, apply techniques such as card sorting, run experiments, and being quantitative and qualitative, right? I think there was one statistic that actually said is that search satisfaction increases 40% if you actually talk to the end users, and you ask how they feel, what they need, and how their language is taken when you’re starting to do modeling all the taxonomy work.

Juan Sequeda:
I think that is something really crucial to be able to go when you’re starting is who is going to be the person or that role that’s going to be talking to people? Is it the data scientist? I don’t think so. Is it the data engineer? I don’t think so. That’s why we need to have this knowledge scientist.

Tim Gasper:
If you have an effective data team, somebody is playing this role. If you have an ineffective data team, you should think about how can you bring somebody with these skills into your process, because they could have a big impact.

Juan Sequeda:
Another aspect is, again, how do we start in order to avoid boiling the ocean? One technique that Ashleigh mentioned was find the most important epics, so go to your JIRA, where you’re keeping track of epics, and just go find which is the most important one. Then literally, there, ask what put people need, what data they need, and why do they need that? I think that helps to prioritize, “Oh, wait, this is an important thing,” or some business reasons, some customers asking for it, and they need this data for it. That’s how I know I’m prioritized. That’s a way you can start avoid boiling the ocean.

Juan Sequeda:
We get started. One of the things that we’re always talking about data is we need to understand our data. We know that there are so many problems with data, and we’ve had so many different episodes about quality and testing and provenance and so forth. One of our episodes that we had in an industry, specific industry, was with John Lucker, and we talked about insurance data, because it is an industry that lives on data. Talk about garbage in and garbage out, right? John was telling us about different products that they had been on, and some surveys that they’ve done.

Juan Sequeda:
He said, “44% of people said that their vehicle information was 100% wrong.” That’s ridiculous.

Tim Gasper:
That’s dirty data.

Juan Sequeda:
That’s dirty data there. The issue is that it’s not just about quality. It’s not just about observability of data, but it should really be about validation. I think, how do we avoid having dirty data from the beginning is let’s go validate. One, let’s go validate. Second is let’s avoid inputting data. Can we pre fill data with something we already have before? That is something really, really important and can help us avoid dealing with dirty data.

Tim Gasper:
Agreed. I think one thing that was interesting from this conversation was this concept around confidence level. Obviously, even in the insurance industry, you see and you just heard that stat from Juan that there’s a lot of messy, dirty, complicated data, and then when you try to integrate it together, it even gets more confounded. But yet, insurance companies aren’t all folding around us, right? They’re functioning to various levels, and some of them are excelling. That’s because you can make decisions on data, even if it is dirty sometimes, so depending on your use case, quality is contextual.

Juan Sequeda:
I think this is another… I don’t know. The phrase that’s come up a couple of times is quality is in the eye of the beholder, and it really depends on the use case that you have if what is the quality. When I talk to people, it’s like, “Oh, we need to have a green, a yellow, a red.” There’s this gold data, and I was like, “Wait, so we’re going to have a centralized team for data, but you want to go centralize the quality metrics? Is that…” [inaudible 00:26:21] everybody in organizations can agree that green means this thing, and yellow means this? I don’t think so. How do we know what that is? We need to go talk to the people who are consuming the data. Talk.

Juan Sequeda:
Humans need to talk to each other. We can’t automate all this stuff, right? This is why we need knowledge scientists.

Tim Gasper:
It’s a theme.

Juan Sequeda:
It’s a theme as you’ll see.

Tim Gasper:
What’s next?

Juan Sequeda:
What’s next? We keep talking about testing data, and we talked with-

Tim Gasper:
That sounds like a good topic.

Juan Sequeda:
We talked with Sam Bail about, “Do you test your data?” Tim, do you test your data?

Tim Gasper:
A little bit.

Juan Sequeda:
Do we need to test our data?

Tim Gasper:
We should more. Everybody should more.

Juan Sequeda:
There we go. That’s the thing. Everybody needs to be testing your data. I think that’s… We do this for software. Why the heck don’t we do this for data? I think one of the interesting issues is like, “Where do you test, right?” Because you can start testing everywhere, so just recall that testing your data is not just about testing the data itself, right? “Oh, here’s the quality. The dates are normalized, or there’s extra zeros or whatever.” It’s also testing the code that generates the data.

Juan Sequeda:
I think this is crucial. There’s two things here. It’s the data itself, and the thing, the computation that generates it both need to be tested. When to start testing is when data moves across barriers, which are usually going to have transformations. Test your data, but focus on where these boundaries are being crossed.

Tim Gasper:
I think that’s interesting, because a lot of folks, I think, tend to think of testing more as at the end like, “Oh, is this dashboard broken, or oh, what are the nulls or percentage or accuracy percentage at the very end?” You should think of it as when the E happens, when the T happens, when the L happens, those are boundaries getting crossed, right? Can you test those things? I think one other thing that was interesting in our conversation with her is she talked a little bit about technology. If you’re fans of open source, she called out the DAG’s deck, D-A-G, which stood for DBT, airflow and great expectations, great expectations being an open source tool around the quality, around data quality, around data testing.

Juan Sequeda:
Quality.

Tim Gasper:
I thought that was cool and something that for open source fans out there, I think, is going to resonate.

Juan Sequeda:
How do we start? Again, think about the last time something broke, and start there, and then iterate from that. I think another good piece of advice that Sam said is the reason why you have domain experts on your staff is because they know better than you about that particular domain, so take advantage of those subject matter experts within your organization. More on data and trust and stuff. We also talked to the folks from Monte Carlo data with Lior Gavish about it, right? It’s like, there’s just so much stuff going around data quality, and why is this cool again?

Juan Sequeda:
I mean, there’s been so many data quality tools forever. All the traditional old school tools are there. Why is this-

Tim Gasper:
It’s a similar story to catalogs, right? There was an old way of catalogs, and then the new way of catalogs is happening. There was this old way of quality tools, and now there’s this new way. Well, I mean, obviously, data testing is one topic, but they are really focused on data trust, right? Do you have issues, and being able to trust your data, and how do you trust your data? What are the factors that allow you to trust it? One of the things that he talked about is if the data is right, but the organization doesn’t believe it, that’s bad, so how do we make sure that data products are being delivered with high reliability, reliability being this idea of percentage of data downtime?

Tim Gasper:
Are you actually tracking your data downtime? How often are people not getting what they need out of data? It’s not just about the data set not being available. It’s about it not being useful and applicable.

Juan Sequeda:
That’s the point, because you could have something out there, but people are not using it. You need to understand how people are using that. You also need to understand how people are consuming these trust signals. It goes back again. For one audience, for one use case, trust is different from something else, and this data has so many dimensions in here that we need to consider. I think another thing that we talked about was threats. Almost you can consider data quality as being more like cybersecurity, depending on your data if it can have threats or not.

Juan Sequeda:
Another aspect we talked about is pipelines. This something is worrying me a little bit that we’ll talk a bit about the modern data stack, but is that we’re now starting to go plug in a bunch of tools in different places, and data’s moving all over the place. There’s a lot of places where things can break. There’s a lot of pipelines, and they’re getting more and more complex, which means that we need to keep observing more and more. Honestly, I’m like, “We’re democratizing and making all these tools, so service and why.”

Juan Sequeda:
Now, maybe we’re having too many of them, that we’re just going to end up having not data debt, not tech debt, but we have pipeline debt. I think it’s something we’re going to talk about in a second, too, is we’ll have integration debt. There’s just more and more debt all over the place. Nevertheless, this is why we need to start thinking more about observations about what’s going on.

Tim Gasper:
I mean, the data sprawl is real, and it’s really becoming hard to keep track of that all. I think one last comment on Lior’s talk is around trust is something that you gain. I think that that’s a hard concept, because so much of what we’ve looked at as part of this series is agility and resilience and moving quickly, and spreading your bets, and don’t boil the ocean, but do something that can be decentralized over time. Then there’s this counterweight, this counterbalance, which is but if you blow trust, it’s hard to regain that trust, and so there’s a balance here of moving quickly, agility, but then also moving carefully and making sure that maybe back to that analogy about brakes, don’t wait until the car crashes five times to put in the brakes.

Tim Gasper:
Maybe you should put in the brakes at the beginning, right? Start simple, but put in the brakes at the beginning.

Juan Sequeda:
Exactly, and just the way it applies in life, it applies in data too, trust. Talking about trust, this is another topic about provenance. Where does your data come from? This is a conversation we had with Professor Deborah McGuinness from RPI who is a world leading artificial intelligence researcher, and provenance. What is provenance? Again, her honest, no BS answer, it’s a who, what, where, when, why of your data. When we say the proof is in the pudding, really, the proof is in the provenance. If we’re thinking about, “Hey, where does this come from? Well, I don’t trust this. Tell me more about this.”

Juan Sequeda:
It’s provenance. You need to keep track of your provenance and where this is. This is connected to data lineage. The question is, “Well, I can keep track of everything. How much do I keep track of?” Use cases, we need to… Yes, we can do everything. Storage is cheap. I mean, you think about if I have X amount of data, I want to have provenance over my data, that can be much bigger than the amount of data that I have. How much should I store and track of provenance?

Tim Gasper:
You could go overboard on provenance or around lineage-

Juan Sequeda:
You can go overboard.

Tim Gasper:
… which are obviously two closely related topics. Her point was that compute and storage is cheap, but be thoughtful, right? If your data is 99% provenance exhaust, and 1% real data, maybe you’ve gone overboard a little bit. Be thoughtful about your approach to provenance, but at the same time, you probably need to be doing more than you’re doing today, because to the point that you just made earlier, we’re seeing so much data sprawl now, so much tool sprawl. It’s a concern. You have to figure out how you can track this stuff.

Juan Sequeda:
If there’s an opportunity of doing more than just this thing was derived from this other thing, derived from that relationship between these two things, but you can be more specific about that. This really depends on your domain, and you can think about it. There are already standards for this stuff like Provo, P-R-O-V. There’s a standard for represent provenance, and so things that we should be thinking about and reusing. It’s a very simple model that can be extended about it. So much that we’ve done about trusting, testing and provenance, but when you have all this stuff together, we need to put in some framework, and that’s where data ops come in.

Juan Sequeda:
That was-

Tim Gasper:
Data ops.

Juan Sequeda:
This was a conversation we had with Chris Berg from Data Kitchen, who is also the author of the DataOps Manifesto, and-

Tim Gasper:
He gave us a little bit of a no BS kind of [crosstalk 00:35:07].

Juan Sequeda:
He’s on his no BS definition. Data ops is so I can have the same life.

Tim Gasper:
Then at the same time, he said that it was a misnomer, that it’s actually misstated, because it’s more about your code acting on the data that is about the data itself, right?

Juan Sequeda:
This is fascinating because we have different guests, and we’ve had these conversations. Different people come back to the same thing, because we’ll see it also with the data mesh conversation is that it’s not about just the data, it’s about the code, right? We talked about the data science, and we talked about computer scientists, the algorithms of computation and the data. We need to keep track about these two things. This is really, really important, not just, “Oh, I need to make sure that this table that I have has all the right dimensions and so forth.” It’s much more than that. It’s your code.

Tim Gasper:
It’s your code and your data. One of the things I really like about Chris and his message around data ops is he tries to keep it simple. One of the things that really keeps it simple around is learn to love your errors, and keep track of your errors. How do you get started with data ops? People are like, “Oh, I don’t know. I gotta learn about it. It’s so complicated like circles and infinity diagrams and stuff.” It’s like, “No. No. No.” Do you have problems? Do you keep track of your problems?

Juan Sequeda:
Write them down.

Tim Gasper:
Write them down, and then which ones are the worst problems? Write a test, and then repeat, and then repeat, and then repeat.

Juan Sequeda:
Then talk to other people. Again, humans talking to humans, right? “Hey, look at all these problems that have occurred. Can we improve them? Can we write tests?” How do you actually start? Write a test, and put it in GitHub. It doesn’t have to be that sophisticated. You don’t need to go buy a brand new tool, whatever, to go do DataOps. You can just go write simple tests, document it somewhere, share with other people, write tests, put in GitHub, go execute, automate as much as you can, and take responsibility and ownership of it.

Juan Sequeda:
I think this is going to be the trend is we need to have ownership of the data. Who is responsible? Who’s accountable? Who gets promoted, because amazing things happen with the data, and it’s reliable, everybody loves it? Who’s accountable for it if things don’t go as well for them? We need to take responsibility for that.

Tim Gasper:
That’s something that goes beyond roles, right? We talked a lot about roles, and a lot of people talk about roles, right. Whether it’s the data product manager, it’s the particular data producer, the data engineer, or the IT person who’s in charge of that system, whoever it might be, think about the RACI, the responsibility, accountable, et cetera around your data, and really, who is responsible? If it’s unclear, there’s a conversation that needs to happen. Who is responsible?

Juan Sequeda:
What’s next? We talked about two different aspects about what’s the future of data management. One of them is about data centric. This was with Dave McComb from Semantic Arts. Data centric is something that we’re seeing more and more. I love his books of Software Wasteland. I always say this, it should be a mandatory reading by every data professional in the world, Software Wasteland. Go read that book. The next book is Data centric Revolution. What is a data centric? Is you have a simple, single extensible data model within your organization, something you can’t buy. It requires discipline, but you need to start thinking about this mentality.

Juan Sequeda:
I think one of the stuff that I love about how Dave expresses that, we talked about what is the problem, and this is what his book Software Wasteland goes into. It’s like, “Look, every time you build or you buy something, or you rent something like SaaS, which is what every company does today, you’re buying another data model.” There’s something else that you have to go manage, you have to keep track of. Every single thing you go by, there’s an additional extra cost about that stuff, which is connected to that data model.

Juan Sequeda:
Why don’t people get this stuff? I mean, just think about how many applications are within the organization. Every single one has a different data model, and that’s why we’re in this big data mess we’re in.

Tim Gasper:
A lot of people don’t think about this enough, and this has been a theme, especially recently in our episodes is this idea of models and being more thoughtful and thinking about models and how they come together. When people do things like, “Oh, I’ve got my HubSpot and my Salesforce and my Zendesk and my this, and my that, and my this.” You’ve got all these different applications. Oftentimes, you think of application infrastructure and data infrastructure as being separate from each other, right? But actually, every single application you have is inherently part of your data infrastructure, bringing its own model, and then leading to the next topic, which is integration debt.

Juan Sequeda:
Integration debt. We talked about data debt. We talked about technical debt. We talked about software, but it’s all about integration debt. At the end of the day, the technical debt is within your own apps, but the integration is what we don’t recognize because it’s all the stuff that’s connecting all these apps together, and we just think about it as, “Oh, it’s just the software,” but no, it’s actually extra working to go do and keep maintain in addition to our data, just so our data can come together. How do we address this stuff is, well, knowledge graphs, right?

Juan Sequeda:
This all comes in by using semantics. We have to use semantics wisely, and be model-driven everything. Try to have 90% plus of your code with no application code. That would be an ideal world. I genuinely believe that we can do that. We’re just so tied in this software wasteland that it is very hard to get out of it, and it’s this discipline, and you need to have a commitment. You think about the world, and [inaudible 00:40:45] want to be resilient for the next hundreds of years.

Juan Sequeda:
For any new startup, any new company coming out, and they have the opportunity to start from scratch, please, please, please go read and go follow these data centric principles, because that is going to make your life so much easier for the rest of your life. Please do that.

Tim Gasper:
We promise he’s not paying us any affiliate fees or anything like that.

Juan Sequeda:
No. No.

Tim Gasper:
These two books are really going to change your life, and make you think about things differently, and it’s bold. It’s bold because it’s saying, “Hey, we can take a different approach to this. Be model driven, and be semantics driven.”

Juan Sequeda:
How you get started is think big and start small. One thing I really love about what Dave said is what is digital transformation. It’s a business taking the budget away from IT. Hey, go pilot something. Get some traction, and show it. You need to show it. Go take a question that you know people are trying to go answer, or it takes too much time. You have to go to multiple systems and different spreadsheets. Go show that if I did this in this data centric approach, we’re using knowledge graph, for example, you can solve that immediately versus your traditional old school approach.

Tim Gasper:
Don’t just pilot the tech. Pilot the value.

Juan Sequeda:
Yes, value the values. What does success mean? Connect it to the business. At the end of the business, what makes me money? What saves me money? Then another big thing that’s connected here is data mesh. Data mesh is this thing that… I think I’m in a data mesh conversation twice a day.

Tim Gasper:
Would you say you’re a fan of data mesh?

Juan Sequeda:
I am a fan of data mesh, and I am a super fan of Schmuck. She is awesome.

Tim Gasper:
She’s a really great speaker.

Juan Sequeda:
She’s a fantastic speaker. Just go on YouTube. Find any presentation. You will love how she presents things in such a clear, crisp and succinct manner. What is data mesh? It’s not a thing. It’s not just an architecture. You can’t buy it. It’s an approach based on decentralization. Honestly, it’s this idea of a vision of an ideal better future where you’re breaking the problem into smaller pieces, where your data moves to the source, the people who actually understand the domain, and they take ownership of that.

Juan Sequeda:
That’s really what it is.

Tim Gasper:
They’re the experts, right?

Juan Sequeda:
They’re the experts.

Tim Gasper:
They know that model for that application or for that domain, right?

Juan Sequeda:
I will call BS on any vendor saying that they sell a data mesh. If you see a vendor that they’re stating that they sell data mesh, that is BS.

Tim Gasper:
What about data fabrics? Is that BS too?

Juan Sequeda:
We can get into that one in a bit.

Tim Gasper:
Check out that episode, data fabric versus data mesh, but that’s for another time.

Juan Sequeda:
So many things that we think about data mesh, I think, it, again, goes back into this balance of data centralization and decentralization. One of the things that I love… What Schumacher said was data has a heartbeat. I think this is something that we need to make sure that data is always alive. How do we keep it alive? The code that generates that data, and we want to be able to have data product managers. We want to be able to have… The data is owned by the domain. Basically, you’ll have… All the tools are out there right now that you would need, right?

Juan Sequeda:
You need catalogs, or you need quality tools. You may need some federation virtualization technology to go do this. At the end of the day, the technology is there. It’s how you assemble it. It’s more about that culture of how you want to go balance centralization, decentralization.

Tim Gasper:
I like that she talked about data products. One of the core things that I think she’s still trying to define, because I think it’s a hard question is this idea of what should be included in the data product? Her proposal is that it’s not just the data itself. It’s data plus the associated compute, the associated policy, and then actually the interface points like, “How do you use it, right? Is it a streaming interface? Is it something you can download? Is it something that is ETL or something else?”

Tim Gasper:
I think it’s very interesting. It’s interesting to think about how our data stack and our data pipelines… You can actually draw circles around things, both at the compute and data level as well as the policy and process level, and those become the data products.

Juan Sequeda:
One of the things is that you want your data product to be awesome, and you want that to be balanced with incentives. Your owners of your data, they should get bonus to people who like it if they use it, if it’s been connected with other things, right? They get reviews and feedback, and they address those things like, “You should really think about bonusing people to make sure that they own a product that people love, and that’s the best type of data product out there.

Tim Gasper:
It’s the final stretch.

Juan Sequeda:
We’re getting there. This is going to be a long episode, because all episodes are usually every 30 minutes, now 40 minutes. This is going to be a long one, but-

Tim Gasper:
There’s been too much good, insightful stuff. That’s why we’ll… This recap episode is a doubleheader.

Juan Sequeda:
All right, technology, let’s talk about technology. Now, we obviously have to talk about the modern data stack, and this is-

Tim Gasper:
That’s the thing.

Juan Sequeda:
That’s the thing. We actually had recently had an episode with Brandon Chen from Fivetran, and honest, no BS. What is the modern data? What makes a data stack modern?

Tim Gasper:
Marketing.

Juan Sequeda:
Marketing? Well-

Tim Gasper:
No, just kidding.

Juan Sequeda:
One thing, it’s cloud based. One thing, it has a nice fancy UI, but I mean, this was a decently honest, no BS, which was-

Tim Gasper:
You got to the core, right?

Juan Sequeda:
It’s like, “Okay, it better make your life easier.” How does it make your life easier is that you’re not installing it, right? That’s where it’s cloud based, right? You want to be able to get to value faster. The litmus test is that it needs to be also self service, therefore… As much as possible, it can be self-service. The amount of consultants that you need to go implement different technology stacks are going to reduce dramatically to the point that they’re not going to be consultants. They’re actually going to be more advisors in how to go implement that stuff.

Juan Sequeda:
I think that’s how you realize that this is a modern data stack is the consultants turn into advisors of how to go implement.

Tim Gasper:
Right, so you get this faster time to value on these different technology things. You get more self service, and in general, usually, in exchange for giving you more of an as a service model, as well as faster time to value, you may be giving up some features, right? You’re doing a standardized approach, but you think about like, “Oh, interesting, there’s an old school data quality solution, or the modern data stack data quality solution.” That modern data stack solution might be a little less expensive, missing a couple of features. But overall, it’s faster to implement. It’s as a service, and then maybe it’s got some different features that are valuable and innovative in a different way.

Tim Gasper:
That makes it interesting, and it’s also a little subjective.

Juan Sequeda:
I think the bar for what is modern is going to raise. We’ll see how that changes.

Tim Gasper:
Companies that were modern today may not be modern tomorrow. That’s just the way it works.

Juan Sequeda:
Be modern tomorrow. One of the other things that I realized is that there’s no modern modeling tool. I think that’s a big fail. An opportunity out there is to go, “Who’s the modern modeling tool [inaudible 00:48:02]?” Then one of the other big popular tools inside of the water data stack is DBT. We had a great opportunity to go chat with Drew Bannon from Fishtown Analytics from DBT, to talk about, “What’s his big rave about transformation stuff?” I really love that, “Hey, we just want to go empower people who know SQL, and let them be self service, because these transformations are declared. Just let them write them in SQL.”

Juan Sequeda:
I think that’s where… It’s a very simple thing that we’re now… I can’t believe it didn’t happen before, because you want things to be as declarative as possible.

Tim Gasper:
One of the things that I… We actually got into it a little bit in our conversation with him was this idea that like, “Well, there’s all these no code tools and things like that trying to limit the amount of that SQL shows up in our life,” but maybe actually, more people need to learn SQL, right? Maybe that’s actually a skill that needs to become more predominant, and every analyst who knows how to do a thing or two with Excel should be doing SQL, and becoming what he called and what their company calls the analytics engineer.

Juan Sequeda:
Engineer. I think this is an interesting overlap between analytics engineering and the knowledge scientists that we’ve been talking about. The analytics engineering are they know the domain very well, and they may not know SQL, or they do a little bit of SQL, and that’s enough for them to become an analytics engineer right there.

Tim Gasper:
They’re both a translator as well as the person using DBT to be the transformer.

Juan Sequeda:
Exactly. They understand the domain, and they can actually go implement those transformations to that domain. Another part of the technology we’ve taught talked a lot is about knowledge graphs. We had a fantastic conversation with the folks from Wunderman Thompson data that they’re now called choreograph. Michael Murray and Bret Harper, their president and their CDO talking about identity graphs, knowledge graphs. What’s the next thing about customer 360? In a nutshell, a knowledge graph is just a way to go integrate data and knowledge at scale. We’re able to make connections between things.

Juan Sequeda:
In particular, this is not just the next chapter in data management. This is really the next book on data management, and it’s data and knowledge management all connected together. I think that’s the groundbreaking thing is that traditionally, we always think about data, but we need to think about what are these data points and how they’re connected in this connection, these relationships. This is what represents the knowledge with an organization that we need to elevate to be first class citizens.

Tim Gasper:
This is the way that you represent that data centric architecture, and have an engine that empower that. I think that one of the things that was exciting about what he talked about is how the technology has been evolving a lot in the knowledge graph space to become more accessible now to more companies such as their own, where now, knowledge graphs aren’t just the realm of the FAANG companies, the sort of the Facebooks and Netflix to do recommendations or something like that. It’s something that can be very applicable and very valuable, and drive outcomes in a much broader set of enterprises.

Juan Sequeda:
Again, how do you start? Well, guess what? Don’t boil the ocean. I mean, how do you not boil the ocean? Well, let’s go to follow your KPIs. You want to design things to touch the market, and how we’re going to mobilize this.

Tim Gasper:
Be metrics driven.

Juan Sequeda:
Focus on what is the relevant data and what is the most impactful data. I think you want to be able to work with the right partner, somebody who’s agile, not just the vendor because at the end of the day, this is a marathon, not a sprint.

Tim Gasper:
Graph is not a panacea, right? It doesn’t solve everything. Don’t think like, “I’m going to throw away all my relational stuff, and move to graphs.” That’s not the answer. The answer is going to be a hybrid approach.

Juan Sequeda:
Definitely. Talking about more technology, a lot of the focus for us has been about structured data, but we did have a fantastic conversation with Joe Hilger from Enterprise Knowledge, talking about knowledge management, which has been associated more with unstructured data, with how is that connected with structured data. I think we start talking about NLP, right? How do we get structure out of the unstructured sources talking about taxonomies as a first starting point to connect the world of the unstructured folks with the structured folks?

Juan Sequeda:
Because, hey, we need to start talking about the same words like, “What’s our business glossary? What’s our taxonomy?” This is a way for us to start bridging these two communities, that they don’t talk to each other, and they should be talking to each other.

Tim Gasper:
The value of the anthologist and some people maybe who are saying they’re anthologists, but they aren’t, all right, but they’re valuable. Anthologists can be very valuable to your organization.

Juan Sequeda:
I think one of the important stuff is that a lot of the work that people do for unstructured and taxonomies has been for doing search, and you can bring that same type of approach of taxonomies and anthologists to structured data, and have that same type of search within the data. I think we look at what Google does for Google search, and they provide you answers using their knowledge graph. You can do that same thing with your structured data by bringing the same concepts that have been applied for unstructured data.

Juan Sequeda:
There’s so much to be done within these two communities. They need to be integrated.

Tim Gasper:
Those two things have to be bridged together, the content world and the data world, the structured side and the unstructured side. They need to work together, and you want to be able to use things like NLP, natural language query, and other types of technologies on both, and it can all work together.

Juan Sequeda:
One thing that we talked about is where does this work live, in which department? Is it the CDO office? Is there a new chief knowledge officer, whatever, but he’s like, “Remember, there is already a CIO,” and that I does not mean infrastructure. It really means information, so we should… We’ve already… Infrastructure shouldn’t be a big thing. We should be focusing on information, and combine your unstructured or structured or knowledge and data together. That’s what the CI should be focusing on.

Tim Gasper:
That’s a call to action. If you’re a CIO or you know one, you’re not the chief infrastructure officer. You should be the chief information officer, and that office is going to evolve. As things move to the cloud, they move to SaaS, it’s going to come back to knowledge.

Juan Sequeda:
Now, final part on technology is we talked a lot about business intelligence. The first episode of The year was with Peter Bailis from Sisu data, and talking about, “Hey, what’s next for business intelligence?” Our latest episode was with Ashley Kramer, the chief product officer of Sisence, again, about the future of BI. One of the things that happens is that you only ask why when something bad happens, and then it’s too late. I think that’s one of the things that Peter was saying is that the future here is about AI, business intelligence and being combined with machine learning is about really being predictive.

Juan Sequeda:
It’s like, “Don’t tell me what happened yesterday. I know.” You need to figure out what’s coming up next. I think that one of the most important things. Second is we need to be able to connect everything with KPIs and measurements, right? This is something that was a nice… I mean, we had these conversations at the beginning of the year, and just now at the end of the year, and they really align about that. For example, Ashley was saying, “We really need to be able to put a number to it, right?” We always talk about how not to boil the ocean is go pick a use case, right?

Juan Sequeda:
Again, boiling the ocean, boiling the ocean, don’t boil the ocean, don’t boil the ocean. Pick a use case. Pick a use case, use case, use case, but actually put a number to it. Let’s go quantify. Put a KPI, and what is the number? This is obviously hard to go do, but an example she said was, “Let’s go eliminate headcount.” If I were able to go do eliminate-

Tim Gasper:
Repurpose that count.

Juan Sequeda:
… eliminate the headcounts, and go repurpose it. Yes, there you go. Thank you for that clarification.

Tim Gasper:
No, I think that’s a big deal there in terms of getting quantitative. As one final point here, it’s that even though Peter and then Ashley had slightly different perspectives, really, these different angles on AI and what sort of things AI was going to be doing, they both shared a very similar vision, which is that you shouldn’t have any idea that you’re using analytics, that the user experience around analytics and around BI is going to continue to become more and more just insights. The insights come to you. They show up, and you can find them seamlessly.

Tim Gasper:
That’s going to be a really good thing, and it’s going to continue to bring data more and more to every single person in the organization. This whole trend towards democratization is not a fluke. It’s not just something that’s a fad, right? It’s something that’s here for real, and it’s going to be hard, because it’s going to really put a lot of stress on knowledge management, on governance, but it’s the right place to go.

Juan Sequeda:
Whoa, that was-

Tim Gasper:
So?

Juan Sequeda:
… five, six months of conversation right there.

Tim Gasper:
That’s a lot of insights right there.

Juan Sequeda:
What did we take away from all these takeaways?

Tim Gasper:
The takeaway of the takeaways.

Juan Sequeda:
The takeaway of the TTTTTT.

Tim Gasper:
Kind of like what’s our vision based on this, right?

Juan Sequeda:
What’s our vision? What’s our vision? I’m in such a lucky position together with you, Tim, is that having all these conversations has really helped us form how we think the future of enterprise data and knowledge management needs to evolve. I think that’s something I would start saying now. It’s not just about data management. It’s about data and knowledge management going together. I’m going to put them in three things. This is how I’m seeing the world. One, we need to have a balance of being decentralized and centralized. The way to go achieve this is you want to have a centralized core of excellence of your data and knowledge.

Juan Sequeda:
That is the group who is going to define the core data models, right? Remember what data centric? This very simple, extensible data model, a simple, extensible data model, that is a stuff within an organization. If you’re an ecommerce company, it’s orders, products, customers, order lines. It’s very simple. That should be managed by the core Center of Excellence, and then you let every domain, the sales department, the marketing department, the shipping department, the finance department, let them go do extend that stuff, because they’re the experts of domain. Each domain will have their own data team.

Juan Sequeda:
Now, this is an organization you can build teams that can be shared across, that’s where you’ll… That’s the balance between centralization and decentralization. I think part of that is that you want to be able to have what is a team. You can have data product managers. You have knowledge scientists. Your data team will have data engineers, will have data scientists themselves, too. It’s a combination of folks who are going to be within your domain. They’re going to understand that business, and then you’re going to have liaisons between the centralized and the decentralized hubs.

Juan Sequeda:
The centralized hubs, those liaisons are going to understand what’s going on across the different hubs. Where’s the friction going on, so we can go keep track of how the feedback is going around?

Tim Gasper:
Interesting.

Juan Sequeda:
Part of that also is that this gives the opportunity within your organization is to have members of your teams go jump between different domains, and that enables… People get excited, because I don’t want to live… I want to learn about other stuff, and this is great. If, hey, you’re you’re the knowledge scientist working in the marketing domain, and now you just moved to the sales domains, folks are going to be excited, because, hey, we’re bringing somebody who’s going to be in our team who knows more about this other department that we don’t talk to, that we should talk to more.

Tim Gasper:
You can cross pollinate the expertise.

Juan Sequeda:
Cross pollinate the expertise, exactly. That’s the balance. Understanding this balance with centralization and decentralization, so I think-

Tim Gasper:
What about tools and tech? How does that fit in?

Juan Sequeda:
I think we always talk… Before decentralizing, you start connecting things together. We talk about connections. We’re talking about a graph. We’re talking about connecting data and knowledge together. This is where the knowledge graph comes in. At the end of the day, I think one of the fundamental technologies to combine data and metadata, knowledge and anthology and taxonomies and all your schemas and all that stuff, knowledge graphs. I always say that your first knowledge graph should be of your data catalog, right? It’s what are the tables, the databases, the columns, the stewards, the business terms? How are all these things connected?

Juan Sequeda:
That’s your first graph of knowledge right there. I think knowledge graph will be a completely centerpiece of technology around that. Then finally, this is… I mean, the spirit of the data mesh is we need to treat data as a product. Every single domain needs to be in charge of delivering those data products, understand who are the consumers of those data products, and be responsible to taking those requirements. Just like when you go shop things on Amazon or whatever, people can give reviews and give feedback, so somebody needs to be there, answering that feedback, cataloging all that stuff, making sure that the next version of the data, we respond to that.

Juan Sequeda:
If there’s urgency, we can go solve problems immediately. I think we’re actually going to have two… There’s two types of data. There’s your raw data, and then there’s your data products. Your raw data has an audience, your data producers, your data engineers, your data stewards. They’re going to actually have a catalog of the raw data. But then the product team, the data team, they’re the ones who can understand the requirements to transform them to data products, and the audience of those data products are going to be your consumers, your data analysts, your data scientists, and so forth.

Juan Sequeda:
They’re going to have another catalog. There’s no need for a data analyst or a business user to go use the catalog of the raw data. They want to understand it. They want to go use a catalog to go discover those data products. I think there’s a phrase that came up once is a data consumer to a data product manager will say, “Is your data fit for my purpose?” The data product manager will say back to that data consumer is, “Is your purpose fit for my data?” I think that’s important balance that we need to go have. That’s my takeaways of the takeaways of the last four or five months, the last 50 episodes.

Tim Gasper:
Balance, decentralization and centralization essentially achieve that data mesh vision. That’s right for you, for your scale of organization, right? Leverage knowledge graph in some way, whether it’s just starting with your catalog, or it’s something bigger as you go forward. Then third, data products become the center. Somebody has to play this role of data product manager. Part of what they’re doing is maybe Marie Kondoing the junk, and turning it into something that can spark joy, and then they’re producing these products that can be fit for certain purposes. If the purpose isn’t a fit, then you figure out the right approach to make those people be able to get their use case done.

Juan Sequeda:
Exactly.

Tim Gasper:
It’s a recipe.

Juan Sequeda:
Well, Tim, this is it, 50 episodes in.

Tim Gasper:
50 episodes. We can’t wait to launch our season two in a little bit here. But you know what, honestly, we’re going to take a little break.

Juan Sequeda:
We’ll take a little break. We will have-

Tim Gasper:
We will have a little brain vacation.

Juan Sequeda:
We will have some surprises going on from the next two months, and the beginning of August, we will be back. Thank you. Thank you. Thank you, everyone, for being loyal listeners and live attendance. This is a… Never in my life I thought I was going to become a podcaster.

Tim Gasper:
No, man.

Juan Sequeda:
Here we are.

Tim Gasper:
It’s happened. Please, don’t leave us alone over the next couple of months here. Ping us on Twitter at Tim Gasper, at Juan Sequeda. You can also find us on email. I’m tim.gasper@data.world.

Juan Sequeda:
I’m juan@data.world.

Tim Gasper:
Let’s keep the conversations going. Who should we be inviting? Who should we be talking to? What topics do you want to see? Let’s do that.

Juan Sequeda:
Remember, Catalog and Cocktails, an honest, no BS, non-salesy conversation about enterprise data and knowledge management. Thank you.

Tim Gasper:
Cheers, everyone.