Data Contracts and Shift Responsibility to Left with Andrew Jones

Tim Gasper [00:00:00] Hello everyone. Welcome, it is time for Catalog and Cocktails presented by Data. world. It's your honest no- BS, non- salesy conversation about enterprise data management with tasty beverages in hand. I'm Tim Gasper, a long- time data nerd, product guy, customer guy at Data. world, joined by co- host Juan Sequeda.

Juan Sequeda [00:00:19] Hey Tim, I'm Juan Sequeda, principal scientist, head of the AI Lab at Data. world. As always, it's a pleasure. It is Wednesday, middle of the week towards the end of the day, and really late in the day where our guest is today. I'm super excited to finally have the pleasure to have Andrew Jones, the principal engineer and known as the inventor of data contracts. We've seen his name and the data contracts over and over, all over the place. Finally had a chance to meet him in person at Big Data London, and super happy you're here on the podcast. Andrew, how are you doing?

Andrew Jones [00:00:50] Yeah, I'm really good. Thank you. Yeah, excited to be here.

Juan Sequeda [00:00:53] Awesome. So let's kick it off. What are we drinking and what are we toasting for today?

Andrew Jones [00:00:59] So I have a Belgian-style beer called a saison. It's brewed about five miles away by a Belgian who moved to UK, started a brewery. What I like about it is, so it's quite a small brewery, and I brew my own beer, or I do when I have some time. I went on the brewery tour there, and it feels so cheap, or it feels like this could be my next career, it's just what I do but bigger. So yeah, one day when I'm tired of data and data contracts, brewing is my next career I think.

Tim Gasper [00:01:30] Nice.

Juan Sequeda [00:01:31] Actually, you're not the first person. One of my best friends Daniel, he moved to Uruguay, he's a software engineer and stuff, and he's like, " I just want to go do beer stuff." And he just was traveling around and thought that was a great country, and he literally just moved there and he's just going to do beer stuff.

Tim Gasper [00:01:47] Mainly in Uruguay.

Juan Sequeda [00:01:48] He's still working in tech for a while, he knew that. Yeah, so you're not the only one thinking about that stuff, so yeah.

Tim Gasper [00:01:54] That's so cool.

Juan Sequeda [00:01:55] Tim, how about you?

Tim Gasper [00:01:58] I'm keeping it simple today. I'm actually in the Data. world office today. So I had to raid the alcohol cart that we definitely don't have in our office, but here's a little Johnnie Walker Red to accompany me today.

Juan Sequeda [00:02:10] That's why we need to improve that cart, because Johnnie Walker red is meh.

Tim Gasper [00:02:15] Yeah, the cart that doesn't exist, I definitely want more of a full bar so we can actually create full cocktails and stuff like that. So something we got to do. What about you, Juan? I think you've got a special cocktail.

Juan Sequeda [00:02:26] I got one that's finally making a real good cocktail this time. I was with some friends this weekend, my buddy who's an archeologist, and he came up, he has this thing called The Speeding Bullet. It's 2/ 3 bourbon, 1/ 3 Aperol with some bitters. So it's some type of an old- fashioned, but no syrup in this one. This is fantastic, I can tell you. What are we cheering for? What are we toasting for, Andrew?

Andrew Jones [00:02:50] So I thought we would toast to everyone who helps share and promote ideas. It sounds a bit like I'm sucking up, but I'm not, things like podcasts, things like medium posts, Substack, LinkedIn. We're so lucky at the moment, anyone around world can help contribute to these ideas, comment on Linkedin, comment on other social networks, whatever, and help really evolve with ideas and really move quite quickly, and things like this podcast helps as well. So yeah, toast to everyone who helps share their ideas, who help promote ideas, who help get involved and try and move things forward.

Juan Sequeda [00:03:23] I love that, that's an awesome thing to toast to, thanks to everybody who's contributing.

Tim Gasper [00:03:29] Yeah.

Andrew Jones [00:03:30] Thanks so much.

Tim Gasper [00:03:30] We have a great data community, and everybody helps each other out, so I really appreciate it.

Andrew Jones [00:03:34] Have to appreciate sometimes.

Juan Sequeda [00:03:35] All right, so we got our funny warmup question here, courtesy of ChatGPT, which is, so does your spouse understand what a data contract is, or do they think you're secretly a lawyer?

Andrew Jones [00:03:49] So no, my wife does not know what a data contract is, she has not read my book. She offered to help proofread it, but I didn't want to put her through that. So I don't think she knows, she probably does think it's some sort of lawyer thing. Yeah, when I first started, I kept calling it date contracts internally. I was always told by my manager like, " Choose a better name. It sounds like you got sign in blood, it sounds too legal." I was like, " Yeah, I'll come with a better name later. Let's just go with it for now." Four years later we're still calling it, and now everyone's calling it. So yeah, sorry about that, but no, my wife does not know what I do.

Juan Sequeda [00:04:25] No, I think if you're using the word contract, people think like, " Okay, what are you? You work in data, you're a lawyer?" There's some data, lawyer things or whatever.

Andrew Jones [00:04:35] Yeah, something like that inaudible.

Tim Gasper [00:04:36] It can be confusing. Yeah, we like to borrow terms in the data industry from other things. Like my wife does weaving, and I was trying to tell her about data mesh the other day. And she was trying to apply weaving analogies and things like that, and I was like, " No. Well, maybe." Anyways.

Juan Sequeda [00:04:53] Well, I mean it's a warehouse too. I think all these terms have been shared. But all right, let's kick it off, there's so much to go through. All right, honest, no BS, what is a data contract?

Andrew Jones [00:05:05] So a data contract is, it's an agreement between those who generate data and those who consume data. And you have an agreement that sets expectations about what to expect from data. So it's got a schema, it might have SLOs, it should have the owner, things like that. So that's all it is really, but you can then use that agreement to do quite a lot of things. You can build an interface from it, you can assign responsibilities from it, which helps you shift things left as well. You can do quite a lot when you have some agreements, communication, and then an interface, and then you can apply governance to it, you can apply information to it. There's so much you can do to it once you start just documenting it and using it in that way.

Juan Sequeda [00:05:52] So the title of this episode, and we hear a lot about, Shifting Responsibility to the Left, so what does that actually mean? And then if you shift to the left, there's the ultimate, you get to the beginning of things, or you can continue shifting left, so what does that actually mean, and what is the spectrum of left?

Andrew Jones [00:06:13] Yeah, so shifting left is the reason why I came up with data contracts. The problem I trying to solve was a problem many of us in data have, we run with upstream data, we're often getting it from ELT processes, or change data capture process, CDC. We are sucking data out of databases, and the upstream people, upstream engineers, they change the databases and everything breaks downstream, and then we have to go now and fix it. I say fix it, we normally work around it by putting in more inaudible ETL and things like that. And I was in meetings with people consuming data weekly, and I kept hearing about the same problem again and again and again. And I just thought, " Well, there must be a better way to do this. This can't be it, this can't be right." I was thinking, " Well, the only people who can control how the data changes are people who generate data, so they need to take responsibility for that." But we can't just tell them, " You're responsible for all this data and you can't change the database." So we need to have some way of enabling them to provide good- quality data, so providing interface, like I do with APIs. And you can't just enable from there really, because my background, the first half of my career was more in software engineering, second half more in data engineering and data platforms. And as a software engineer, we'll never expect to be able to build an entire database and expect it to go well, it never will, we'd always have some sort of interface there. And I just felt we didn't have that interface with data, and that was maybe the root cause of the problem. Let's allow them to provide the interface, let's let them own the interface, and let's build on top of that, and we build enough confidence. So that's really where it came from. So when I talk about shifting left, I'm talking about shifting left all the way up to those who generate data, and only they can really generate better- quality data, it's their data.

Tim Gasper [00:08:06] Why the term data contract, why does that resonate and why is that really applicable here?

Andrew Jones [00:08:12] Yes, that came, again from APIs. Often when you have an API, when you talk about a contract between the provider and consumer, and that's where it came from. I thought, " Well, I want the same thing for data," called it a data contract, it sounded nicer than data API. But that's where the term came from, it's something that you can depend upon, so you can build upon confidence and set expectations again. So yeah, maybe not the best word in terms of how strong it is, but I think we're stuck with it now. It seemed to have resonated really well with a lot of people in data, so we're stuck with it now, but alas really?

Tim Gasper [00:08:56] Yeah. Well, it seems to have resonated a lot. I know that a lot of folks are talking about data contracts, and it's unique enough, it's memorable enough, it seems to hit that important sweet spot around, " Oh, I wonder what this is, and I want to understand how this can help me bring better practices to my organization." And one thing I know we want to talk about, maybe I'll start with the positive, and maybe, Juan, you want to do the flip side of this, is what is a data contract, what is not a data contract? I'll start with what is a data contract, which is one of the things that Juan and I talk about, is we have this framework we call the data product ABCs. And it's accountability, boundaries, and letter C happens to be contracts, but we always say contracts and expectations because we're like, " Well, sometimes it's more of a contract, sometimes it's more just an expectation. Is that a contract? I don't know." This is where words get hard, but I'm curious about, maybe let's start with what is a data contract, and then I'll let Juan drive the other side.

Juan Sequeda [00:09:57] I'd love to go through some examples of what would a simple one look like, and then how would it get more complex or more expressive or something?

Andrew Jones [00:10:08] Yeah, that's a good question. Start off simple, I think a contract is anything that describes the data and that sets expectations on data, and that has basic metadata and data. So the owner and things like that. That's a contract.

Juan Sequeda [00:10:24] Hold on, let me just pause there for a second. So by that definition, a schema describes the data. So is a schema a contract?

Andrew Jones [00:10:33] No, I wouldn't say it is on its own, because it doesn't set any expectations around data, it doesn't give an owner to the data, it doesn't set any responsibilities for data. It's just describing what data looks like, but it's not doing anything to prevent that from changing, it's not doing anything else.

Juan Sequeda [00:10:52] If I look at a SQL DDL that says it has to have these data types and have these constraints, it is telling me that what I'm expecting, and the database constraints will make sure that it will have its integrity. So-

Andrew Jones [00:11:05] Yeah, but I could drop that tomorrow, and it's gone. I can alter columns, and although it's got constraints, I can remove the constraints, there's got to be somebody doing but, there's nothing helping with change management there. inaudible.

Juan Sequeda [00:11:20] This is a very simple example, but there's these small nuances that you just said right there, which is really important. It's like, " Yeah schema, but I can then change the schema tomorrow." So what, right? inaudible.

Tim Gasper [00:11:30] Yeah, that seems to be the layers test here.

Juan Sequeda [00:11:33] So in that sense then, a schema is part of a data contract?

Andrew Jones [00:11:39] Yeah, it has to be part of it. You have to know how to consume that data, how to create that data. So you need to have some sort of schema. And schema is basically, it's the foundational data quality check, at least I know what's in there, the types of them. And with a day contract, I know it's not going to break and change overnight. That's part of a data contract. So yeah, a schema with some change management around it, some expectations around it, that's probably all you need for a data contract. I used to get inaudible.

Juan Sequeda [00:12:11] What would you argue are the minimal change management or expectations that should be associated to it?

Andrew Jones [00:12:17] Well, I shouldn't expect it to have a break and change without some kind of migration, that'd be a minimum. That migration could be, " I'm changing it tomorrow, update the code tonight," or it can break, or it could be something a bit nicer, probably a bit more process around it saying, " I'm going to break this schema for these reasons. There's a new schema I'm publishing alongside, you've got three months to move over to it." Again, imagine you're consuming from an API, say you're building on top of a Slack API for example, people build businesses on Slack APIs. And it's like, " We're not going to change our API overnight and break those businesses." That's bad for them, it's bad for their customers. They might change their API in future, it's got a version associated with it. And they'll give some sort of migration path, a migration to it. Any kind interface is the same. So depending on pandas or some other Python library or some other library, same sort of thing, you've got a version, a migration path when it changes, a major version, things like that. So actually, probably version is something you need to have in a simple data contract, probably what I missed earlier, but probably version is a critical part.

Tim Gasper [00:13:27] Yeah. So that word really jumped out at me. And as you started to talk about change in migration and some expectations around how fast things change, having a path to not have something break you, versioning very much came to mind. And so it sounds like you ended that statement there with, " Versioning ends up being a pretty important part to a data contract," because I'm assuming that there might be some best practices here, maybe a major version increment means a breaking change or something like that.

Andrew Jones [00:13:59] Yeah, exactly. And this is all stuff that people in software engineering have been doing for a long time, and that we also do in data engineering if we're building libraries and things like that. You've got semantic versioning, you've got major versions, minor versions. And you've got public interfaces, you've got private interfaces you can build on. Private interface, if you do that, it's not going to work very well, there's no guarantees going to work. Build it on a public interface, more guarantees, more expectations, it's going to work, it's not going to break overnight. If it does, it'd be a major- version migration path. It's all the same thing really, it's all best practices whenever you're building on anything, you have an interface you build on. If you want to be reliable, you have an interface to build on. We haven't been doing that with data for a long time, forever maybe. We have been building on top of people's internal databases, private interface, basically a no- code API. And that I feel was the root cause of most of our data- quality problems, and that's what I wanted to fix with data contracts.

Juan Sequeda [00:14:58] So getting into what are, I love how we've gotten into a lot of what are they, I want to unpack this more and get into more details, but let's call BS on some stuff. What are you seeing out there that are being called a data contract, but you would say, " Red flag, calling BS on that," what would that be?

Andrew Jones [00:15:20] So I think some places where people are using term data contracts and have not gone as far as I would like them to go is where they're not shifting left, it's just schemers or it's just helping them define their quality checks. I mean, they're good things to do, but they're not shifting things left, they're not really improving quality of data. They're making it easier to do checks, their ability to things like that. But for me that's not solving the root problem, that's not solving the problem I want to solve with data contracts. It's not doing anything to shift left, it's not doing anything with quality of data. So that's one thing I think some people are doing that I am less keen on, and that's call it day contract. Another thing would be where people, they cannot assign ownership, they're not shifting to the left. Basically those kind of things I think are not really a data contract. Or where it's just a schema basically, and just a schema, and just a bit of schema management, that's not a data contract. If it's just a schema, you haven't got anything else in there, you've got no version and nothing to change it, that's not a data contract either.

Juan Sequeda [00:16:39] So then it seems like if we go to the spectrum of you're in the right and you go to the left, the data engineering teams are more on the right, and then the software teams, the ones who actually created the software, the original source of where the data's coming in, those are further to the left. So does that mean that data... Who takes responsibility for these data contracts, is it now the software engineering teams, or is it the data engineering teams? Because if you're really truly shifting to the left, the data teams are not really part of it, it should be the software engineering team.

Andrew Jones [00:17:19] Yeah, I mean that's what I feel. I think only the software engineering teams can affect the data they're producing. What we can do downstream, data engineering, we can work around it, we can try and affirm this in data, but we can't really improve quality of data. Say we need the data to be more timely, we can't make it more timely downstream, we're bound by what's generated. Say we need a certain field to be populated and it wasn't captured upstream, we can't populate it downstream, infer it for us, it's only going to get so far. If we want to improve data quality, it has to be done by those who generate data. I don't think data consumers, people in data, they're still part of it, they are there to provide requirements, they're there to provide incentives. Why do they need quality data, what they're going to do with it? That's going to build a value for business. That's still important, that needs to be done, but look at it the other way, if they were to define the contract, and saying, " Hey, software engineers, meet my contract." What software engineer is going to be incentivized to do that, and why would they, and how would you get them to do that?

Juan Sequeda [00:18:31] Answer your own question.

Andrew Jones [00:18:33] Well, you just wouldn't. There's this concept in software engineering about consumer- driven contracts where, as a consumer, you provide a contract and the software engineer has to meet it. It's not niche, but it's not that well- adopted, it's something that you might do in the Microsoft architecture to help with testing and things like that. But you never start there, you always start with APIs, and producers own APIs, and they provide APIs to rest of the business. That's just the easiest thing to do, it's the simplest thing to do. And I think it's the same for data, I can't see why it should be much different, like, " Why are we not generating data with any discipline? Why are we just sucking out from databases and thinking we can build them out reliably?" Maybe the argument was like, " We didn't need to do it reliably, we're just generating dashboards and stuff like that." And maybe that was a fair argument. I'm not sure it was, because we still spent a lot of money building those dashboards, and therefore I think we could do it cheaper if we had put a bit more effort upfront. But even if we say that was a fair argument, it's not a fair argument. Then that data is been fed back into product teams, fed back into ML models and drive product features and drive revenue. It's no longer okay, I think, that you just do that with little discipline or no discipline at the start. You have to have it the right way through software, or it's not going to go very well.

Tim Gasper [00:19:54] Yeah, otherwise you're just depending on the humans to hopefully do the right stuff and your system's hopefully not to fail, right? This is interesting.

Andrew Jones [00:20:01] Yeah, we all know systems fail, right?

Tim Gasper [00:20:03] Mm- hmm.

Andrew Jones [00:20:03] Systems always fail. Humans all want to do the right things. So when you speak to software engineers and explain to them problems and say, " Hey, I've been building off this stream from your database. And when you change the database, it breaks everything." They're like, " Well, yeah, but I need to change my database for reasons, performance reasons, for new features. I need to change my database, I won't give up autonomy." You're like, " Okay, that's fine. Can you provide me some other interface?" They're like, " Yeah, I can see why's that helpful." Same with APIs. So it's not that hard to sell to them, and then it just comes about incentives and prioritization and things like that, which is the same as any other prioritization argument. But I think for some reason we have always, or many organizations, we just don't feel like we can talk to the software engineering team. There's no communication between those consuming data and those generating data, there's so many layers in between, so many different teams in between, we just never changed that assumption, that we could ask for better data. And that's what we did with date contracts, I tried to change that assumption. And it went pretty well easy, easier than I thought it would go. And then I started building some interface around that and data contracts, and it's gone from there really. And yeah, I think that's what you need to do if you want to depend on data.

Tim Gasper [00:21:15] Yeah. And just going back one more time to what are not data contracts, you talked about it needs to improve the data quality, it needs to help with the shift left to be a data contract. One topic that I think is interesting, that has some intersection with data contracts, but I suspect is not data contracts itself, is data testing. So maybe you're using things like Soda or Great Expectations, maybe you're using other types of tool sets where you're saying like, " I've got various assertions about what I expect the data..." And these tests need to pass or fail, is that a data contract, is that part of a data contract?

Andrew Jones [00:22:02] I think it can be part of it, I think you can have checks in the data contract. And they can run early, before it's published. So you can localize the problem, you can have the alert sent to the data generators. And I think you still need checks downstream as well, because you're never going to predict all the possible ways that it might fail, in software, things are always going to go wrong. You might have a field that you have a data contract around, and you think it's always going to be between zero and one, but you never had a check for it, and one day it's 100, and your like, " I don't know why that is, but that's going to cause things to break." So things like checks and observability, they're still important, but they shouldn't be the first attempt to try and catch data- quality issues. The checks should be, they could be part of the data contract. And what I've seen people do is have checks in their contract, and then generate libraries, or generate maybe inaudible. Generates libraries that software engineers come and use to publish the data, but also run those checks. And if those checks fail, the software engineers get a alert, and they can fix it before they send their software, their service. So you're reducing the impact of any failure. It's not gone into your ETL or your dashboards, and you use them and notice that it's very localized. And that again, as any software engineer who will tell you, the cheapest place to catch issues is as early as possible. So if you didn't catch it in your CI checks, if you didn't catch it in integration test, you might as well try and catch it in your service before it goes to all the other services. Again, software engineers have worked this out, where I don't understand... I would challenge the assumption that data is in any way different to how software engineers have been doing this.

Tim Gasper [00:24:03] Yeah, it seems like the software engineering analogy goes pretty far here, because even in software engineering, when you have certain dependability or assertions that are coming from something to the left of you, the smart engineering thing to do, so there's an assumption here, but the smart engineering thing to do is that you're still going to have some validation and some confirmation in your service to be sure that that contract is being complied with before you go and break a bunch of stuff downstream.

Andrew Jones [00:24:32] Yeah, exactly. I say you never trust your inputs, so you always have to design, inaudible your mind. But how far you go with this depends on how important the data is, how important what you're building on data is. I'm not saying you should go this far with every dataset, build all your checks in all the data contracts inaudible. There's a cost to do all of this, a cost in time, a cost in money. If it's important, it's worth doing all of that. If it's driving revenue, if it's a key part of your product feature set, then it's worth doing all that. If it's less important, maybe do some of that. It's so unimportant you're doing none of that, it's probably not even worth generating data.

Tim Gasper [00:25:15] Yeah, that's interesting. So one other thing that comes to mind here, Andrew, is, going back to APIs, APIs often have Swagger documentation and SDKs and things like that, there's things that tend to wrap around APIs, right?

Andrew Jones [00:25:33] Mm- hmm.

Tim Gasper [00:25:36] Where do data contracts go, where do they live, and is there stuff that's supposed to wrap around them?

Andrew Jones [00:25:44] Yes, I think data contracts should live in a similar place to the APIs and things like that as well. So if you're asking software engineers to create data contracts, they need to be able to create them where they most expect them to be. So if they are creating their APIs maybe in Git repo inaudible service, or they're creating the infrastructure there, or if it comes from where they're working, they should be able to create data contracts in that same area. If they asking them to go and create a date contract in a web UI somewhere else, or some different system, it's a harder sell. You're adding quite a lot of friction there. You're separating the code that's generating data, to where the data has been defined. And inaudible those again, what I think, almost definitely where it's been defined if it's somewhere else. So I think it needs to be, if you are targeting with data generators as the people you want to own and be responsible for that data contract, they need to be able to define the contract where they are working.

Juan Sequeda [00:26:48] It's an agreement, the data contract is an agreement. So the point I want to make here is that it's not just the software engineer who is doing it, there's other people involved, because again, people process technology. So I guess the contract itself is going to be some sort of code that lives inside of GitHub? I'm trying to get very specific here. But there's other people who need to go look at it and like, " Yeah, that's exactly what... You understood that correctly. Good. Let's go to these checks." Are we expecting now the people who don't use GitHub to get into GitHub, how would all that agreement actually occur and where would that happen?

Andrew Jones [00:27:30] Yes, I think for the most part the agreement happens before it's been defined in code and defined in GitHub. So say you're creating a new service, and that's going to generate some data, and you know that it's going to be important for some reporting reasons. So maybe talk to your BI team and say, " Hey, you need to do some reporting around this. We're generating data. What kind of things do you need to generate the reporting that you've been asked to create?" Have a conversation around that, maybe some sort of inaudible document, however you want to do that. And then the code is the last bit, it's like, " We've agreed on this, and now we're implementing it. And then we can evolve it over time if we get things wrong or if we missed something, we can evolve over time. It's not a set in stone." I just say evolution is managed. So I think a lot of that happens before it's in code, but once it's in code, you're right, people need to be able to find it, discover it. So I think it doesn't just live in, say GitHub, but that's the source of truth, that's where it's defined. But from there you can very easily move it to other places. So what I've done in the past is convert my data contract into an open API version of the data contract, and then use that to publish to a data catalog. We use a thing called Backstage from Spotify, it's a catalog that's got APIs in it, it's got our services in it, and it's got our data in it as well. So it's all in one place, but you can convert to any kind of format, convert into any kind of format you like. So we convert to ProtoBuf for example, and configure Pub/ Sub topics and things like that, and you could do the same Kafka. So you can always convert your data contract into any format and ingest it into a date catalog, into LookML, into whatever you like really, because it's machine- readable as well human- readable. And it's not that hard to do that. It's an afternoon to convert it into an API inaudible, or a couple of days to try and convert it to LookML. These are things we've done. So yeah, it's a great way to provide that metadata to other services.

Tim Gasper [00:29:37] That makes sense. So it sounds like in general there's the source of truth around the data contract itself, which is going to be ideally code, because it's embedded in with what you've developed for your data pipeline or for your data services or whatever that is, right?

Andrew Jones [00:29:58] Mm- hmm.

Tim Gasper [00:29:58] But then if you're trying to create discoverability around it, then some place like a catalog or whatever other means that you're using to create discoverability around data and analytic artifacts and things like that can make a lot of sense.

Andrew Jones [00:30:12] Yeah, exactly. You don't want to end up in place where data is siloed. Maybe you do, particularly if you're thinking about data mesh in the future, you probably do want to start moving towards a more decentralized model for this data, where it's owned by different people. And that's fine, but then you need to make sure, yeah, it's not siloed. It might be centralized, it might be isolated, but it's not siloed. You can still discover it, you still know how to query it, you can still join it. And it's a scenario where we are very lucky again, because we've got great data warehouses where you can isolate things by some orphan of BigQuery, so isolate things in datasets and GCP projects and things like that. We can still query across them with no cost at all. So you can isolate for ownership, isolate for responsibility, but without siloing data, without any cost of moving that data again to make it available to other people. You could do the same in similar things like Kafka and Pub/ Sub and things like that. So we're very lucky to have this tooling that allows us to do that. And we can use data contracts to implement those by ownership, and allow discoverability by interfacing other systems, and governance and data governance, all that sort of stuff can be implemented in a data contract. Once you have a metadata that describes data, it becomes quite easy really to build on top of that and do quite interesting things with that data.

Tim Gasper [00:31:40] Interesting. And all of this discussion around code and things like that makes me think that although there's some best practices here, there's a lot of decisions to be made, and I'm sure that there's a lot of companies that have implemented some data contract approaches that have had to invent this for themselves. And I know some of the stuff that you've been working on is to help people navigate some of this. But then there's also vendors now that are starting to say like, " Oh, we're a data contracts platform," or a toolkit or something like that. Can you tell us a little bit about what is the vendor landscape around data contracts, why might you want to use a tool? And is that useful, or is this a little more experimental like, " Let's see where this goes," right?

Andrew Jones [00:32:30] Yeah, I think we're still early really in the data contracts journey, and we're thinking not everyone needs to be building their own infrastructure for data contracts. So I think it's good that people are looking at how they can solve this problem more generally for other people. So I'm excited to see what they're building and how successful they are. I also think it's interesting how there are some people who are thinking about creating a data contract standard, and some vendors involved in that, about how they can use their contracts to help... It's an interchange format for data, so you can, I was giving example earlier where we could grow a data contract into something that then a catalog could ingest and make it available, or inaudible ingesting and start doing quality checks on it on things like that. So that's quite interesting. I think though with some of the new vendors who are building data contract tooling, a question I would ask is like, " Who are they expecting to use this tooling? Is it the data generators again? Are we shifting left to software engineers? Or is it aimed at the data engineers and the data people? And if it's aimed at with data people, how successful would it be to actually change the culture of organization to one where you have shifted that responsibility left and you are applying discipline to data that's been generated?" So we'll just see how that evolves really.

Juan Sequeda [00:34:06] So okay, I mean, let's get very honest, no BS around some of these things. So we talk about the solutions around here, and how complicated or easy or not complicated should these solutions be? We were talking about this before, is first of all, the contract itself needs to be defined and live somewhere, it's a piece of code or whatever. And we can talk like, " Is there a standard for that?" All the way from a syntax or whatever, " Is this is in YAML or JSON or whatever?" It exists and it's living in GitHub or whatever. Now I need to use that contract and execute it somewhere, I think you described this before, having an interface to drive that data through it. And that interface, what you're arguing is it should be shift left, and if it's truly a data contract, it should be something that software engineering should be doing. And then there should be, it could be a Kafka topic, Pub/ Subs. I mean, if you do it as a table in a data warehouse, then you're not shifting left enough. So how complicated is this? You're saying that we should go generalize these infrastructures, but are we just generalizing something that's pretty simple, that really there's not much in relation to do, or if we do that generalization, that's something that should be more on the software engineering stack, versus more, we're seeing vendors and tools putting things in the data stack? But I'm like, " If you're doing the data stack, that's not left enough. So it's not really a data contract." Anyways, I'll stop ranting.

Andrew Jones [00:35:51] Yeah, so maybe I'm making it sound more complicated than it should be, because I think it's quite simple to build this tooling. So first time I did this, we did a spike in a MVP in a couple of weeks. In my book there's one chapter on implementing a simple data contract, building a BigQuery table and putting it in a schema I think as well. And it's 15 pages, one chapter, just one chapter on that. Usually I have to source some tooling to build on top of. So it is not that complicated. The way I like to build it, and in those examples, I built it on top of these infrastructure- as- code platforms. So there's one called Plum. io, it's in my book, because I had source one. At the company I work at, we had an in- house one, so we built on top of that. So you shouldn't build it all from scratch, but it's not that hard, a YAML file, a bit of python and code, then build interfaces. And then you've got a data contract really, and you can write to that. And then you can take it as far as you like, you can build software libraries, make it easy to publish, you can integrate a data catalog. Those are all things you do, do data governance and anonymization and privacy and stuff like that. Those are all things you can do on top of that, but the simplest thing, like I said, there's one chapter in my book, 15 pages, and it walks you through a fully minimal viable data contract platform.

Juan Sequeda [00:37:17] All right, we're always looking for quotes for a inaudible. This is probably a long quote, but something like, " Data contract implementation, it's a 15- page chapter in my book."

Andrew Jones [00:37:32] Yeah, I'll get that printed.

Juan Sequeda [00:37:37] Would you get that T- shirt? If I'm printed it, would you want that T- shirt?

Andrew Jones [00:37:40] For sure, definitely.

Juan Sequeda [00:37:42] Yeah. Well, I mean again, honest, no BS, I mean this is the type of stuff that we got to go talk about. You got to be honest, no BS about these things and-

Tim Gasper [00:37:48] And it's easy to overcomplicate stuff, right?

Andrew Jones [00:37:50] Yeah, I think the tooling is easy, the tech's all there, we got great second inaudible. So we've got great tech. The harder part is how you incentivize software engineers, how you get things prioritized, how you actually do with shifting left, how you assign responsibility. That's all the hard part, and that's where I spent most of my time over the last few years talking to all those different people, explaining to them problems we're having, why we need data to be better, because we're using it to build things that drive revenue, that align with our company goals, all that sort of thing. It's not hard particularly, it takes effort. It's something that a lot of data people think we don't do very often, but we can do it, and once you get our buy- in, you can get it prioritized and you can get the software engineers to then work.

Juan Sequeda [00:38:44] Let me get into the one you just said, " If we get buy- in." So let's talk about the buy- in, the people, the incentives. All right, go.

Andrew Jones [00:38:57] Yeah, so all of this is assuming that data is important to your organization, or that you want to use data for something important. And that's probably what many organizations are. There's probably something in your goals, in your vision, in your business goals that says, " We'll deploy an ML model that does this and drives this product feature. Or we'll use data to drive some product feature." Or maybe you software engineering teams or using data as well, and are moving data between themselves maybe to improve how they build customers, or integrate with Salesforce better, so they can do growth better. I say most, maybe everything, data is being moved around all the time to try and meet business goals. If business goals are important enough, then it shouldn't be that hard to get some effort prioritized upstream in the software engineering team who generate data to say, " Can you generate a bit better quality data? Because that will save us loads of time downstream. Or can generate a bit better quality data, a bit more reliable data so we can depend on it, so we can build this important feature that cannot go down every week like our dashboard does at the moment?" So yeah, and if you're an organization that doesn't value data, probably trying to deploy data contracts isn't going to be successful. Yeah, you probably don't invest too much... You're probably not investing too much in data in general. And if you're listening to this podcast you're probably thinking it might be the best job for you. But many organizations are not like that, they are investing a lot in data. We spend a lot of money on data, we spend a lot of money on data warehouses, spend a lot of money on data hoarding, we spend a lot of money on data teams and BI people, and data engineers, analyst engineers and all the different things we've got. So doing that a bit better, a bit cheaper should get prioritized.

Juan Sequeda [00:40:58] Now, it goes back to just, this is, " Show me the money. I'm going to go invest in this if you tell me that the investment of this is going to give me a larger return than that investment."

Andrew Jones [00:41:10] Exactly.

Juan Sequeda [00:41:11] And it sounds obvious, but it's just very typical that people forget about it and they just get so excited about the tech.

Andrew Jones [00:41:17] Yeah, and I do too. I'm a techie, I'm an engineer, I could talk a lot about tech. But really, I spend most of my time not doing tech, it's mostly around... I guess it happens to everyone. I think when you get more senior, you tend to move away from the tech and the code and making the code look great and efficient, and you start moving up and up and up. And then you realize it's all about, " What value are we delivering as a team? How do we show that? How do we use that to get more investment to prioritize our things?" That's the journey I've been on over the last few years through data contracts really, and something that everyone has to get to at some point if you want to change your organization, to try to do things better.

Juan Sequeda [00:42:04] Tim, any final words you want to-

Tim Gasper [00:42:06] I think maybe just one last question here, which is who do you think in the organization, just think about any sort of an enterprise here, their data team, their engineering team, who in the organization is going to be the best advocate for data contracts? Is it going to be on the engineering side? Is it going to be on the analytics side? Is it a governance person? Who's in the best position to be evangelizing and pushing for this?

Andrew Jones [00:42:32] Well, that's a good question. What I think I did really at the start was, I had this idea about data contracts, and I did show it to the data people, and obviously they loved it, they were like, " Yeah, we want better data, sounds great. Go with that." But also, before any code was written, way before that, I spoke to a lot of our engineering teams, and again got my buy- in. And got them involved in the solution design and made feel like owners of this program of change I wanted to try and achieve. So really got them on board, and I think they end up becoming the best advocates because they understand the problems, and they are software engineers talking to software engineers, not a data team talking to the product team saying, " Hey, we all need better..." But they're actually within there, and they understand the problems, and they understand why it's important, and they are there when people are not in the room evangelizing for this thing. But even better, they might be depending on the data themselves, they might feel the same sort of pain, because actually software engineering teams also depend on data from other software engineering teams, and sometimes that isn't of great quality, and that causes problems. So I think they are good evangelizers. At some point you probably need to get leadership pulled in as well, if you're going to try and do a big program of change where you maybe are starting to think about changing existing data, which obviously has an investment in that, or if you're trying to think about changing your org and maybe moving people around, and when you start getting towards doing mergers and things like that, but that's probably further down the line. But yeah, at the start, just getting software engineers involved. If everyone you want to do the work, to define data contracts, they need to be pulled into it.

Juan Sequeda [00:44:21] So you're saying the champions who should be evangelizing data contracts should be the software engineering team?

Andrew Jones [00:44:28] The ideal ones would be-

Juan Sequeda [00:44:29] Ideal?

Andrew Jones [00:44:29] Ideal.

Juan Sequeda [00:44:30] We're talking about the ideal ones here. And actually you said another really great point here, is that the software teams also consume data from some other software teams. So just within the software world right there, that's ideally-

Andrew Jones [00:44:41] Yeah, we've seen that some of the great success we've had is we've had things organically moved to a data contract. It was moved to it because the software engineering team needed that data, and they wanted it to be reliable because they were going to do something else with it. And it wasn't data science, it wasn't fancy AI and ML, it wasn't dashboards, it was just like, " We're moving data between these services, and we need to move it in batch. And we're using the data platform tooling to do that because it's most suitable for this particular use case. And actually, we want a new schema, and we want SLOs around it, because what we're building on is important." So we used data contracts.

Juan Sequeda [00:45:27] I actually was not expecting this answer. I was expecting somebody on the data side or whatever. And I have to say this is a refreshing answer because I sometimes feel that... I mean, the whole point that I'm realizing, that my main takeaway from our discussion today is that data contracts is more about the software teams and not just the data teams, which I would actually say that we need to change the name, call it the software data contract, or something like the software data agreement. I think it would be a more appropriate title or label, description here because that's a truly shift- left one, because I think what I'm seeing, just talking to people and all the blah, blah, blah you hear and they say data contracts, everybody I talk to, or are talk about data contracts, none of them are software engineer folks. And they're not even talking to the software engineering folks, right?

Andrew Jones [00:46:21] Yeah.

Juan Sequeda [00:46:21] And then just because it has the word data contract, then it ends up being more on the consumer side, right?

Tim Gasper [00:46:27] Yeah.

Juan Sequeda [00:46:27] It's really the things that you have argued that are not really data contracts.

Tim Gasper [00:46:33] There's a lot of people talking about data contracts who are like, " The BI dashboards want the data warehouse to provide some more reliability," or something like that, right?

Andrew Jones [00:46:43] Yeah. And to be fair, that's okay. Data people produce data too, and they should have a contract around that. So it's not bad. It's good, but you're still quite far downstream inaudible. Yeah, you can't make data better at that late stage, you can only make it all worse.

Juan Sequeda [00:47:05] Not making any enemies here-

Andrew Jones [00:47:08] No, I don't like... So I guess these are my opinions, and people have different opinions. And if people are doing things, if people think the best thing they can do is not of data inaudible but a level or two behind that, and they think, " That's as far we can get." And they want to prove that, that's good, that's still better than we've got now. I am either ambitious or optimistic, fully optimistic, but I think we can go further, and I have done that. And I don't see what would prevent enough people from doing that.

Juan Sequeda [00:47:43] I applaud you, Andrew, for really pushing the barrier and showing the community that we can do more and we should strive for excellence.

Andrew Jones [00:47:56] Yeah, I think first of all, what we're doing now is so important, it's driving revenue.

Juan Sequeda [00:48:02] And that, drive revenue. Yes, show me the money, as I always say.

Andrew Jones [00:48:07] Exactly. And if it's not that important, then don't go that far. Maybe go to those levels that other people are thinking of going to, to at least make things better for yourself. But most organizations are going that far, they are trying to drive revenue from their data, so then it's worth doing it better.

Juan Sequeda [00:48:30] Yeah. Tim and I are backchanneling here. Tim, just come out with your comment there.

Tim Gasper [00:48:35] Yeah. My backchannel comment to Juan here was, common sense, yet a lot of people don't listen to it.

Juan Sequeda [00:48:44] Right. Anyway, I told him we could keep talking, I got a bunch of stuff I want to keep chatting about. Looking forward to our next conversation in London hopefully and having some beers. Hopefully your home- brewed beers, would love to do that. But all right, next, our AI Minute. You got one minute to rant whatever you want about AI. Go.

Andrew Jones [00:49:06] So far I do think slightly different, and it's not so much a rant, but it's more like, again, a bit like what I did earlier, more appreciation. So I remember the internet coming along and playing Counter Strike on dial- up and things like that, but I didn't really appreciate it, I was too young. I remember the Apple Store coming along, and the iPhones, and I wasn't really in position to take advantage. But seeing all the things we're doing with AI and the amount of change that could happen from it, good or bad, and being in a position where I understand what's going on, I'm relatively be close to it, I'm not a data scientist, but I understand roughly what's going on, it's pretty unique opportunity really just to be part of that. So I think maybe stepping back and just realizing that we're lucky to be part of this big change no matter how it turns out, it's a nice thought.

Juan Sequeda [00:49:54] 1000%. This is what I tell everybody, like, " We are in 1992 when the web came out."

Andrew Jones [00:50:04] Yeah, exactly.

Tim Gasper [00:50:06] ChatGPT is like AOL, it's like, " There's so much more to come here."

Juan Sequeda [00:50:11] I have my iPhone here with me, this is version 15. This has evolved a lot, this stuff is going to evolve.

Andrew Jones [00:50:19] Yeah. And we don't really think that our phone... We take it for granted now, but 15 years from now, what will Ml be doing for us? It's amazing to be part of it in a full way.

Juan Sequeda [00:50:32] All right, we got our lightning- round questions, so I'm going to kick it off. Number one, is the discipline of thinking about data contracts and expectations more important than the data contracts themselves?

Andrew Jones [00:50:44] Yes.

Juan Sequeda [00:50:48] All right.

Tim Gasper [00:50:48] All right, second question, are data contracts a mandatory or a required part of data mesh?

Andrew Jones [00:50:57] Yes, they are.

Juan Sequeda [00:50:59] Right. Are data contracts more valuable for tech companies where a lot of the upstream data is software- engineering generated?

Andrew Jones [00:51:09] Yes, probably. It's quite hard actually to do data contracts when you don't control the generation of data. I'm not saying I haven't worked that out yet, because you can't shift it far left, so yes.

Juan Sequeda [00:51:21] Okay, so here's an interesting point, if the data contracts do live in the true data engineering world, if you have no control of... I mean, you shift as left as possible, it's like, " Well I just got my data from Salesforce, I can't change that."

Andrew Jones [00:51:39] Yeah, you're not going to make Salesforce give better- quality data. You're small, you're not going to have that influence over. But what we can do is, you might have a Salesforce admin, and they might be creating customer objects, and they can own the data contract, that's as less as you can go. At least they can start setting expectations around data, but you obviously can't put code in Salesforce, so you have to think about a different solution. That's something I haven't yet really worked out a solution for, but that's probably where a lot of stuff people are building is probably good for those kind of things.

Tim Gasper [00:52:13] That's great, go as far left as you can go. All right, final question here. Think about the maturity of a company and their data stack and their data strategy, are data contracts something that organizations should leverage from the start, or is it really more of an advanced thing?

Andrew Jones [00:52:32] Yeah, I got asked this quite a lot actually, because I was three years into building a data platform at a company. And then I said, "Okay, now onto data contracts." And tried to change direction a bit. If I go back in time, I would've done it from day one. I think it's a lot easier to do it at the start, migrating data to data contract is a huge effort. Yeah, it's very difficult. But not just that, just the way you build data platforms, they are much more effective when you're building on data contracts using a data platform is what I call it. And you can do a lot of things much easier, like governance, even simple tasks like backup or some things like that. You can build all those types of things with data contracts as well. We didn't get time to talk about those things, but you can build a whole data platform around the idea of data contracts. And also from the start, say your culture has one, but your data contract has one that says the data is owned by the data generators. And the culture is probably the hardest thing to change when you're a three, six, 10- years- old company, it's a lot harder to change the culture in any way. So yeah, do it as early as possible I would say. As soon as your data becomes important, as soon as you start using data for something fairly important, you probably want to start to put a data contract around it.

Tim Gasper [00:54:00] That's a great recommendation and takeaway here, because I think that for a lot of folks, they may already be on their journey now, and so now it is much more of a change management and a technology change issue. But for those lucky few listeners out there that are getting to start new or building a separate stack or something like that, think about how you can leverage data contracts or the discipline of it sooner. You're going to be happier from a governance standpoint, from a usability standpoint and much, much more. Oh my goodness inaudible episode, we've got some takeaways, huh?

Juan Sequeda [00:54:33] All right, Tim, take us away with your takeaway.

Tim Gasper [00:54:35] All right, so we started with, honest, no BS, what is a data contract? And you said that it's an agreement between those who produce data and those who consume it. It could be SLOs, SLAs, owners, etc. The key is the management of the responsibilities around it, which allows you to have dependability and understanding, and shifting responsibility to the left. So thinking of a river of data, a river of information, it's coming from the left, it's flowing to the right, you want to shift to the left. And what does it mean to shift responsibility to the left? Well, this is the whole reason for data contracts. You said the problem of we're bringing in all this data from all these different systems, something breaks, we have to fix it. We need to have some way to have reliability, predictability, guarantees around these things, or else we get to be that age- old problem of data quality, which is the data engineer getting woke up at three o'clock in the morning on a Sunday morning because they have to go fix the data pipeline. No, we don't want to live in that world. We don't want to live in the world where that dashboard goes to the board and it's the wrong answer. So I think what's interesting here is the software engineering practices that you recommend. Really, a lot of this inspiration comes from software engineering where you have these guarantees around the information that's being passed through the API, around the information that's coming through the database. Usually you don't build your services directly talking to the database. You have a service that sits in front of it that interacts with it, and it provides these kinds of guarantees around user experience, around ownership, around responsibility. We need to bring that to the world of data, and that's what those data contracts are trying to do. And what are data contracts? Well, it's anything that describes the data and sets expectations around the data. And I think we had some interesting explorations around what is and isn't a data contract, because Schema, for example, might have some assertions about like, " This is a string," and certain ways that you can access that data, but if there's never any expectations set, then you're not really creating a data contract around it. It's part of a data contract, but you haven't gone the full way. So I thought that was super interesting. Juan, I'll pass it to you. What about your takeaways?

Juan Sequeda [00:56:46] Yeah. Well, then what are not data contracts? Or maybe it's being called it, but it really isn't. So your position is that it's just not going far enough, where it doesn't really result in a truly shifting left, and you're not really effectively improving the data quality. So for example, data testing can be parts of data contracts because you want to have checks to ensure that they're compliant with the data contracts, but just alone a data test isn't. You should be catching them in integration checks and CI/ CD checks. And then you should be adding them there in case those didn't catch it. And you can provide notifications, they can be localized, but at the end, data testing by itself is not a data contract, it's part of it. And really thinking about the shifting left, it's really pushing it down to the software engineering team. For example, if you want more data to come in more timely, the data teams can't do that on their own. They really have to go push that to the software teams to go do that. The data teams will provide the requirements, and we need to work out the incentives there. I think one of the big issues here, there's a lack of communication between the different teams, and so many layers in between that. Another aspect is where do these contracts live? So if the software engineering teams are actually creating these data contracts, it should exist in a place where they work in. So it's GitHub, because if you ask them to do it somewhere else, they're not going to get that there. We also talked during the lightning- round questions that if you're not building yourself the software doing that, then you can push it as left as you can, which is wherever that data is coming in from your different sources. And then if you want these contracts to be discoverable, then this is where the data catalogs will come in, because you can be able to define these data contracts and be able to go push them into data catalogs so they can be discoverable and other people can go use them or understand what's out there. Talking about vendors, yeah, it's very early still, but I mean, why create something brand new and bespoke? We have to leverage these tools, but just remind ourselves that this doesn't have to be too complicated. Go read the chapter in Andrew's book, it's just 15 pages long, it describes the implementation of this. Incentives, how do we get people to adopt the data contracts? Bottom line, just show me the money. Every organization has some kind of goal, they're trying to accomplish X, they're building Y, they're improving metric Z. If the business goal is important enough, then it shouldn't be hard to get folks upstream, typically in this case, the software teams, to agree to provide that higher- quality data, because it's more dependable data to serve that goal. And at the end, who should be the champions, the evangelizers of these data contracts? Ideally the software teams, because they're the ones who are actually going to be implementing this stuff. And now of course, from there you go to leadership. But how did we do?

Andrew Jones [00:59:24] That was great. Yeah, perfect.

Juan Sequeda [00:59:27] Anything we missed?

Andrew Jones [00:59:28] No, I mean there's always things to expand on, we can talk about this four hours and hours.

Juan Sequeda [00:59:32] All right. Well, to wrap up, let's throw it back to you, three final questions. What's your advice? Who should we invite next? And what resources do you follow?

Andrew Jones [00:59:42] Yeah, I think my advice is, can't relate it to what we're speaking about, but just always try to remember why you're trying to do something. So we talked about data contracts, so why are you trying to deploy them? What is your aim? What do you want to achieve? What is it helping your organization achieve? Who else is needed to achieve it? What else is needed to achieve it? And then if you need people, in our case, data and software engineers, just go and speak to them and get them on board, and explain to them why it's important. Again, I think I'm a optimist, but I think most people want the best outcome. And if you get the right people in the room to talk about the problems, you'll get the best outcome. And those right people are data people, software people, PM. Those kind of people, get them in the room and you'll get a good outcome.

Juan Sequeda [01:00:39] This is kind of common sense, but again, what Tim was saying, we sometimes forget about it. So I appreciate you bringing these things up.

Andrew Jones [01:00:45] Keep it simple really, right?

Tim Gasper [01:00:47] Yeah, keep it simple. And I think with data contracts, sometimes people think, they're like, " Oh, the point of a data contract is so you don't have to talk to people," or something like that. It's like, " No, it's so you do talk to people." So you bring each other together.

Andrew Jones [01:00:59] Yeah, it always comes down to people, people more than technology, always

Juan Sequeda [01:01:04] People. Who should we invite next?

Andrew Jones [01:01:07] So I think someone I've been on a couple panels with recently who talks really well about data contracts, data products, data mesh, it's someone called Amy Raygada who works for the Swiss Marketplace Group, she lives in Germany. But she talks a lot really well about these kind of things. Different opinions sometimes to me, which is obviously good. But yeah, I love the way she talks about data products in particular as well as data contracts, data mesh.

Juan Sequeda [01:01:36] Cool. And then, what resources do you follow? I mean, one, you have a book.

Tim Gasper [01:01:43] Check out the book.

Juan Sequeda [01:01:45] We're the non- salesy podcast, but all about education, and it's a book, so yeah, get your book for sure.

Andrew Jones [01:01:54] Yeah, I do a bit of everything really. I do buy a lot of books, I read some of them. I subscribe to a lot of newsletters, email newsletters. I don't read them all the time, but like to browse them. What I really like though is the in- person stuff, now it's back again. So I'm lucky I live close to London, so there's lots of meetups, they're really good. What I particularly like about them is I'll get people at all different stages of their career, able to talk and able to have conversations with. So yeah, I like those in particular, local meetups, smaller meetups, as well as the big conferences as well. So yeah, a bit of everything really. And podcasts as well, obviously less so because inaudible as much I used to, but they're still great and a more entertaining version of events.

Tim Gasper [01:02:52] Podcasts are always fun. And where can people find your book? Is it on Amazon?

Andrew Jones [01:02:56] Yeah, it's everywhere you can get books, Amazon, you get a subscription service if you get that. If you go to data- contracts. com, you'll find the links to everywhere you can get it. So yeah, if you want to know more or you want to implement it in a chapter... Yeah, I love it. You should check it out. And do let me know what you think if you do happen to check it out.

Tim Gasper [01:03:20] Data- contracts. com.

Juan Sequeda [01:03:22] All right. Andrew, this has been a pleasure. Thank you so much for being on the podcast. We truly, truly appreciate it. We got some really honest, no- BS takes on this. So we got an Andrew T- shirt coming up soon, we're going to do this, Tim, one day.

Tim Gasper [01:03:38] We've got 156 or so T- shirts we got to design here.

Juan Sequeda [01:03:41] All right. Andrew, cheers. Thank you so much for everything.

Andrew Jones [01:03:46] Cheers. Thanks for having me. It's been great fun. Thank you.

Tim Gasper [01:03:48] Cheers, Andrew.

Catalog

Explorer

Marketplace

Governance

Workbench

Catalog

Explorer

Marketplace

Governance

Workbench

Financial Services

Healthcare

Higher Education

Insurance

Federal

State and Local Government

Financial Services

Healthcare

Higher Education

Insurance

Federal

State and Local Government

Data Leaders

Data Engineers

Data Governance Professionals

Analysts & Business Users

Data Leaders

Data Engineers

Data Governance Professionals

Analysts & Business Users

Integrations

API Documentation

Reference Implementations

Support

Integrations

API Documentation

Reference Implementations

Support

Snowflake

Oracle Database

Postgres SQL

Databricks

dremio

Snowflake

Oracle Database

Postgres SQL

Databricks

dremio

Blog

Events

Podcasts

Webinars

Reports and Tools

Blog

Events

Podcasts

Webinars

Reports and Tools

Who We Are

Our Team

Our Partners

Why data.world

Who We Are

Our Team

Our Partners

Why data.world

Press & Media

Events

Careers

Legal

Contact us

Press & Media

Events

Careers

Legal

Contact us

Catalog

Explorer

Marketplace

Governance