About this episode

Open source software is one of the largest and fastest growing segments within the data landscape. And if you’re implementing DataOps practices or considering data mesh, openness and flexibility are key architectural principles.

This week, Juan and Tim are joined by Denise Gosnell, CDO of Datastax, to talk about the business of open source and how community-centric data applications are reshaping the enterprise.

Special Guests:

Denise Gosnell

Denise Gosnell

CDO, DataStax

This episode features
  • A glimpse into the future of open source data
  • The difference between open source and open core
  • What open source tool best personifies your personality?
Key takeaways
  • We need not just data observability, but data traceability
  • We should be pushing back on how our data is actually being used
  • Figure out what empathy looks like for yourself and for others

Episode Transcript

Tim Gasper:
Welcome. It’s Wednesday, once again, it’s time for Catalog and Cocktails, your honest, no BS, non-salesy conversation about enterprise data management. I am a longtime data nerd and product guy, Tim Gasper and joined by my co-host, Juan.

Juan Sequeda:
Hey, Tim. I’m Juan Sequeda, I’m the principal scientist at Data.world and as always, it’s Wednesday, middle of the week, end of the Wednesday and trying to take a break and chat about data, and all the fun stuff that we always do. Today, we’re going to continue our conversation a little bit like last week, which was about graphs, it was going to be graphs and more than graphs and it’s going to be about open source, and we have an awesome guest for this topic, which is Denise Gosnell. Denise is the chief data officer of DataStax. Denise, how are you?

Denise Gosnell:
Well, I mean, it’s Wednesday and I absolutely loved Juan and Tim, what you guys are doing here. So this is quite and a fabulous end into the Wednesday because I’m on the East Coast, so I’m doing great.

Juan Sequeda:
Awesome. Denise and I, we’ve kind of known each other for a long time. Really, we meet at conferences. Well, we met at conferences but-

Denise Gosnell:
Yeah, we did. Yeah.

Juan Sequeda:
Denise, did her PhD in computer science, a lot of work in graphs. She has a book on graphs. So if there’s somebody who knows about graphs, who knows about data, who knows about open source, I think Denise is on the top of that list in the world and I’m just so excited to have you here today and let’s go chat about all of them.

Denise Gosnell:
Yeah, thanks. Yeah, I mean, I’m pretty sure, you’ve been incredibly busy too, with your recent book out on Knowledge Graphs. So between the two of us, we’ve got it covered.

Juan Sequeda:
Exactly. So what are we drinking and what are we toasting for today?

Denise Gosnell:
All right, so this is critical. I’m drinking Lacroix, but it’s the best flavor of Lacroix. This is Apricot Lacroix, which is clearly the top flavor, and I am toasting to new approaches and new strategies. Juan, we are talking about this, this is the one thing I had bookmarked on my calendar after my sabbatical. I just had a nice three month break, where I disconnected. I was doing a ton of hiking. We can talk about that if you want but yeah, just new approaches and new ideas. That’s what I’m toasting too, today.

Juan Sequeda:
Awesome. Tim, how about you?

Tim Gasper:
I am today, drinking just a regular old fashioned but I got a special set of bitters in it. These Workhorse Rye Salted Cacao Bitters, and they’re pretty tasty. So it’s salted cacao and you know what, first of all, I’ll cheers to new approaches. That’s awesome. I will also cheers to … actually one of my co-workers is going on a sabbatical starting next week and that sounds awesome. So I’m going to cheers to sabbaticals.

Denise Gosnell:
Absolutely.

Juan Sequeda:
Well, I think we’re going to be toasting for sabbaticals because I would love to take a sabbatical a couple of years, but yeah, I want to follow your footsteps, Denise. I’m having … I just made this up always. I’ll figure out what’s in my refrigerator, my bar, jalapeno, passion fruit syrup, cucumber vodka and I think one of the best sparkling waters now which is Aha Cucumber and Strawberry. I think this beats your Apricot Lacroix, Denise.

Denise Gosnell:
That is number two in my house and that exact flavor is. So, I’m with you.

Tim Gasper:
You have that too?

Juan Sequeda:
All right, well, cheers to sabbaticals and to really interesting sparkling waters. All right, so we got our warm up question today. This is what open source tool best personifies your personality?

Denise Gosnell:
Yeah. Well, I mean, I can’t answer Graph because that’s going to be like way too obvious. So I’m going to go with Apache Airflow and I’m going with Apache Airflow because it’s connecting data sources together for managing your data, your data tasks, et cetera. I mean, he doesn’t love anything more than connected data. I of course, love connecting data so yeah, I’m going with Apache Airflow, and I freaking love it.

Juan Sequeda:
How about you, Tim?

Tim Gasper:
Is Jupyter open source?

Denise Gosnell:
It’s a great question.

Tim Gasper:
I know Zeppelin is, so I’m going to say Jupyter, even if it’s maybe not open source, potentially. I think it is. Maybe it’s not Apache. Maybe that’s the thing, because it’s analytical, it’s collaborative and it’s not super fancy. It’s no frills. It’s practical, right? So I’m Jupyter.

Juan Sequeda:
I’m going to go to my root. So Apache Jena and Jena is the RDF semantic framework which also has a graph database so it’s about having the database, right? It could be all about connected data too. We also have this framework, right? It gives you this way, how to go establish, how to best manage all your data, so there we go, all about connections here.

Tim Gasper:
I love it. Hey, we can get some real work done. We got Jupyter here, we got a graph database, we got Airflow. Man, we could do some stuff.

Juan Sequeda:
All right, so-

Denise Gosnell:
This is getting way too serious now, guys. No. No, real work.

Juan Sequeda:
All right, so it’s time to get a little bit more serious, but honest and no BS. So Denise, honest, no BS, what is this future of open source and data going to look like? Is it going to be just Apache Jena and Airflow and Jupyter Notebooks or what?

Denise Gosnell:
That is. The three of us are ready to take the future. That’s all we need. Okay, so what is the future of open source data look like? I mean, if I had to really pick one central theme, I would love for it to be much more transparent and I’m thinking … when I say transparent, I’m seeing transparency from the full data stack, all the way from just being able to understand how you fit into these algorithms, when you’re a consumer of different data services or people you know, are using your data, all the way down to having transparency and observability into how they’re performing. So, to me, the future of open source data is really going to be at the heart of transparency. For example, I’m thinking, even transparency all the way into how recommendation algorithms work for say, your favorite social media site.

Denise Gosnell:
Because these algorithms, the way that they are right now, they’re really convergent. So as a consumer of these sites, the way that they work, it’s kind of pushing people to the same type of content or the same set of content. So when I say convergent, that’s what I mean and it just would be really nice if it was transparent as to how it was working and it was transparent to how … or to like which sets of content are becoming very popular and then maybe even transparent, where you could change how content is delivered to you, so you can get out of purposely being put into a silo. So when I’m talking about transparency at the end, I’m already thinking at that level and then, as all of us being engineers and data practitioners, it would be really great if it was a lot easier to see the basic performance metrics of any data API, be it like scalability or its throughput, or just the payloads that are failing when it’s trying to go through.

Denise Gosnell:
I mean, just transparency all around, I think our entire system right now is very opaque into how data is used and I would just love for it to start to be a little bit more observable.

Juan Sequeda:
So I like how you’re going with this on, transparency goes into so many different aspects, right? So the consumer is accessing some data, whatever, through some open source tool, right? It could be and I want to understand how that works, so transparency over there all the way down to like the actual storage engines and figure out how that works and what’s going on underneath the hood. So we don’t have this level of transparency right now, is that your point that we’re lacking that transparency, right now?

Denise Gosnell:
That we’re lacking it right now, absolutely, to the place you just mentioned, all the way down to the database but then all the way up to the end consumer, understanding how their data is being used, how to remove their data from it or even being able to see how your content is being put in front of them and maybe creating a divisive environment or putting them into a silo that they don’t want to be. Transparency all the way from the top to the bottom. Make sense?

Tim Gasper:
So not observability and transparency or transparency in terms of observability, but also lineage seems to also be something you’re talking about.

Denise Gosnell:
Yes.

Tim Gasper:
There maybe technical lineage in one respect but also sort of the business lineage in another.

Denise Gosnell:
Absolutely, yes and you just hit on another like soapbox topic for me and it’s data provenance and not having transparency into the provenance of some data. When we talk about data provenance, we’re just talking about where it came from, where it’s going, who owns it, how it changed, all of that. So Tim, absolutely.

Juan Sequeda:
I think part of this whole transparency we want to get, is also on the privacy aspect, right? So we want to go is, how our data is being used … I mean, in the provenance, where is it coming from and where is it going to go too? So what are right now the open source tools or systems out there that are focusing on this type of observability on privacy, on transparency?

Denise Gosnell:
That’s a really great question and the only thing I’m thinking of right now is what Charity Majors is doing at Honeycomb. What they’re going after for creating observability more into the lower part of the stack with observability in the API performance or different back end engineering items. That’s the only one I’m really thinking of at the moment. On the other side of the transparency debate or transparency argument, I think we run into it all the time. It’s whenever you go to a website and all of a sudden now, because of GDPR, and other compliance issues, there’s that pop up. That’s like, “Are you okay with cookies?” We’re all like, “Yeah, yeah, yeah, we’re fine,” but if you happen to be the person who’s not, Juan, when was the last time you tried to look at your cookies setting on a website and then change it?

Juan Sequeda:
Yeah, I don’t do this.

Denise Gosnell:
It’s incredibly difficult because there’s just not an easy way to figure out how cookies are tracking your information, where it’s going and the different ways that you can limit how much they have to track maybe even just the minimal and if you select only minimal cookies, what does that still mean? There’s not even transparency at that aspect, and I know you asked about tools, and I don’t really know if I have a ton of answers there because maybe that’s the future of what we need to create for our market. Yeah, when I think of transparency, I first think of Honeycomb and what they’re doing in the back end, and then I think of the day to day consumer problem when we have no idea what these cookies are tracking about, ourselves.

Juan Sequeda:
So is this about creating more, or are there lack of new tools or existing tools need to be able to go add more transparency to them or is it both? I mean, how much you want to go boil the ocean or not. That’s what I’m thinking?

Denise Gosnell:
Yeah. I mean, the answer at the end of the day is going to end up being both and I see fewer tools for the latter problem I described. I see fewer tools that are educating consumers on the transparency of their data usage, because as we’re having this conversation, I’m thinking about Grafana. I mean, that’s a really great tool for observability into your back end systems and performance metrics, et cetera. There’s probably plenty more, I don’t know if anyone is on Twitter, on YouTube or LinkedIn, because we can see your comments. If any guys are thinking of some, let us know and put them in because we can see them here. I don’t know, Tim, Juan, do you guys know of any tools that are solving the problem on the consumer side, making it transparent to where data is going?

Tim Gasper:
There’s some open source libraries and things like that, that are interesting around sort of certain types of scanning and things like that. For example, I’ve been looking a lot into sort of PI related sensitive data scanning and things like that and like, there’s some interesting libraries and things out there but there isn’t a ton in terms of big projects. In terms of observability, you got stuff like Great Expectations, for example, which is a little more on the testing side, right?

Denise Gosnell:
Okay.

Tim Gasper:
In terms of like data monitoring and data observability, it just seems like in general like lineage observability. I mean, you’ve got some open source metadata things out there, like Amundsen and things like that. It just seems like data quality, observability, transparency, it is tended not to turn into a lot of open source tooling, really. At least not in a flexible broad way, right, like … Well, there’s like Atlas and that sort of project on the Hadoop side, but I don’t know if that’s really going a lot of places right now, right?

Denise Gosnell:
Maybe that’s why this is the future of data, right? So, if people want to invest on creating some tools that companies and teams are going to pick up and want to use right away, I would heavily bet that this area is going to be fast growing in the next five years or so.

Juan Sequeda:
So let me go take this to kind of the area of who’s going to be the user of these tools. So if they don’t exist, they don’t … these open source tool, right, observability, right, they don’t exist because people haven’t realized that we need them or is it because, “Why spend my time on this? There’s already commercial tools out there.” At the end of the day, who are the consumers of these open source tools because maybe, for enterprises who actually have that problem, kind of starting with that premise, they’re not going to go use open source for that. They’re just going to have a commercial tool or a vendor around it.

Denise Gosnell:
I mean, that’s a great question and happy to kind of toss some ideas around.

Juan Sequeda:
Go.

Denise Gosnell:
The first place I would look would be just the space of enforcing GDPR. So let’s say that you as a company … well, if you are supporting customers in the EU, you better be doing this, but you have to have pathways and ways to enable any user to request all their information and delete it, right. That’s GDPR. I think one of the first ways that we could create more observability around that question would be a tool for your data teams at a company. It would be a way to more easily track and have the ability to have that provenance information readily available so that you can answer that question much more quickly, because if I understand it now, it’s mainly customized pipelines that people are making in order to answer those questions. I mean, Tim, Juan, I don’t know if you guys have any other experience there.

Tim Gasper:
No, I think that’s right. I mean, I think one of the things that I’m excited about and I’m curious if you’re thinking the same way is like, I love the things like DBT are getting a lot more traction in the community, because obviously that at its core is an open source project, right?

Denise Gosnell:
Mm-hmm (affirmative).

Tim Gasper:
So, as people start to build more open source componentry into their stack, into what they’re doing from a data pipeline perspective, I feel like that’s going to lend itself more towards this ecosystem expanding, because in the end, it’d be nice if all these things could really work well together and maybe this kind of spawns more in that area. I don’t know if that sounds right or not.

Denise Gosnell:
Yeah, I think that DBT would need … I mean, correct me if I’m wrong, I’m not a deep DBT user but I think that we would need maybe traceability, maybe we should go with that word. That needs to be a new word data observability of systems is going to be essentially traceability of data in our little metaphor there. I think that DBT would be really benefited if it had some type of way to understand just quickly your full lineage of a payload. I don’t know if that’s something it offers but when I’m thinking about transparency and visibility into data lineage or data provenance, I’m kind of thinking that there would be a way to understand every component of the system that that piece of data has traced through, and that’s essentially at the end of the day, the question you needed to answer for GDPR. For this customer, I need a quick map everywhere that their data has touched.

Juan Sequeda:
Yeah, I think there’s a project like Open Lineage, right? I think our friends from Datakin, I think those are the ones who’ve been kind of pushing this open source … or kind of like, I call it, almost to go familiar is schema to be able to go to find open lineage about all your jobs and stuff, but actually, it was interesting that our conversation is heading more into like the interoperability of metadata, because at the end of the day, the metadata, it’s going to explain to us how all this real data is moving around, right? I think one of the things that we really need is, I want to have a way, an open source or open approach to be able to connect all my different tools around. At the end of the day, you’re not going to have one tool, one silver bullet, that’s going to go do everything for you, right?

Juan Sequeda:
All the way from, let’s go integrate data. I mean, you’ll have all these different things, right? You have … I mean, look at the modern data stack where you have your ETL tools, your T, your DBT, right, you got your reverse ETL, known vendors and you got your data warehouses. You got your Airflow’s, you got a bunch of open source and commercial tools and there’s just so much stuff that you need to go pick and choose from, and how do I go connect all this stuff together. You connect it, right now, it’s writing custom code or you have, “This thing connects only to these three tools and so forth.” So I want to have just an open way to say, “Look, I’m going to plug in a data quality tool. I’m going to plug in a PII tool. Tomorrow, I’m probably going to swap it or not, or whatever, I just want to have that interoperable way of bringing any type of system that deals with metadata.”

Juan Sequeda:
That open approach is what’s going to be able to let me go do that traceability. So I think it’s … I don’t think it’s more about having another software system. I think it’s more about having an open approach to kind of describe metadata. That’s my position now.

Denise Gosnell:
Yeah, and I mean, knowledge graphs solve that perfectly, don’t they Juan?

Juan Sequeda:
Well, yes, I believe so.

Denise Gosnell:
Yes, they do.

Juan Sequeda:
I mean, this is why I always argue that your first knowledge graph is better be of your metadata. You want to be able to go connect all your metadata together, so I know … not just that I know what’s out there from what sources I have, but also how things connect and flow together and then later on, you can go do more stuff with the rest of your data, but then the organizations would think about … when you think about connected data, in reality, they couldn’t understand how your metadata is connected and actually the people, and that sort of thing, it’s not just about like the bits in your system. It’s actually the people within your organization because at the end of the day, when we talk about GDPR, it’s because there’s a government who’s asking for this. There’s a person who’s asking for this. It’s people, it’s a social technical phenomenon. It’s not just about technology.

Denise Gosnell:
Absolutely, yeah.

Tim Gasper:
Yeah.

Denise Gosnell:
I think … so the original question that we were chasing, Juan, which I thought was a really good one, which was who were going to be the consumers of this open source tech to create transparency? We kind of dove in into … potentially what the world of data engineers and data operators, like the data teams inside these companies, what that could look like. I’m going back to this point because some of the comments coming in here from LinkedIn are starting to highlight what type of transparency people want on the other side. Someone here is talking about who can solve this problem, but I just want to know who’s collected cookies about my browser behavior? Absolutely. I think that’s a really great comment. I think that’s Rodney, who brought that in. So thank you for that, because that type of solution is absolutely the type of transparency into your data usage that people just want to know.

Denise Gosnell:
One more step after that is, I want to easily be able to go to this site and understand how to control and change those preferences. Now, that’s not exactly a data transparency and observability problem, but that is definitely an information or architecture issue that is out there right now, that is right at the edge of the data community.

Juan Sequeda:
We have a … I can see, he’s a LinkedIn user, right? Not another metadata system, but a metadata ecosystem where I can place and replace the tools I want, plug, play and plug.

Denise Gosnell:
Absolutely.

Juan Sequeda:
You nailed it, that’s exactly what I was thinking about. It’s not a system, right? It’s an ecosystem. I want to be able to have an ability to … that connective tissue that I can go plug in and play any type of system that’s going to be using my data and understand how the data is being used, it’s like you need to co-connect that metadata? So I’m liking where we’re heading on. I was not thinking about transparency, kind of being the future of open source and data but I think, if it’s transparency, it’s about, “Let’s make our metadata, first class citizens and make it interoperable and not just go buy and go get all these tools and plug them together by hand wiring and writing code and all that stuff.”

Tim Gasper:
Yeah, it seems like maybe we’ve got some of our foundation now, right? We’ve got our databases and our data warehouses. We have some of our data pipelining. We’ve got our orchestration, with things like Airflow and Daxter and things like that, right? So maybe, as you’re kind of noting, Denise, the next frontier here is really more the metal layer, right? It’s the observability, the transparency, the lineage. Its understanding and the more that enterprises and companies can understand their own data, perhaps, the better that they’ll actually be able to help consumers be able to understand what the hell is being tracked about them and things like that, because if a company can’t answer the question themselves, how are they going to then actually serve the consumer in this situation?

Denise Gosnell:
Yeah, absolutely, Tim and to the person, I can’t see who it is on this one, but there was someone who commented and said, “Are there enough people scared enough by this today?” Tim, to your point? I think that there is a growing motivation to make this more tractable, for inside businesses and eventually to consumers and I think we’re getting more pull today from the consumer side and that’s where you’re starting to see the explainable AI movement really come into play with people wanting to understand, what is this algorithm doing for me and how do I really fit in. That’s ties back to when we were first talking about this and I was doing a really poor job of just describing how these recommendation algorithms, they converge content or they converge your experience to sets of content, instead of continually refreshing it to a more diverse, maybe collection of the full sampling.

Denise Gosnell:
We feel that anytime we’re on the internet today, when it feels like it’s incredibly tense to have a conversation because it’s painted a picture that there’s side A versus side B and that just happens to be a byproduct of algorithms that are maybe optimized to put people for … optimized for click through rates and optimized for time on page. It creates this silo and division that actually might not exist in the real world, but that’s a part of our digital experience now. So yeah, Tim, I just thought that your commentary on that really fit in with where explainable AI is being pulled from us, the data of practitioners and tool builders from the consumers.

Juan Sequeda:
Is there any open source explainable AI system, whatever that would even mean? I’m curious.

Denise Gosnell:
I think that’s going to be Tim, Denise and Juan Co 2026.

Tim Gasper:
Yeah, is there a really good explainable AI period right now?

Denise Gosnell:
Is any AI explainable anywhere? Just give us one example of explainable AI.

Juan Sequeda:
Well, symbolic AI like rule based AI is definitely going to be explainable, right? Now, we have all these black boxes and stuff. I think that’s another thing that we need to kind of … just as practitioners as community, it’s like, “Okay, we get all this AI stuff and we’re using it on my data,” right? Okay, I understand how the data flowed but then suddenly, the data flowed into this box and something came out of that. So I understood that it came to this box and came out. What happened in that box? We literally have this beautiful flow of how things are going and suddenly, just disappeared and then something came out. We don’t know what happened, so I think, if we’re thinking about this transparency, AI is taking this big part of it, we’re going to just, “Oh, dot dot dot magic happens and then continue,” and what, are we just going to buy that?

Juan Sequeda:
Is that okay? I think the consumers are really … They need to go push on this and say, “I want to know more about this, tell me more?”

Denise Gosnell:
I do, I do think that they need to push on it and that’s, I want to … Yes, like all the claps to what you just said and I think that the first area that we hit so often, and have that muscle memory of just saying, “Oh, I accept. I accept.” The first way we run into this every day is with cookies. Tell me what you’re doing with my data, with collecting of these cookies and just give me an answer for that, because it’s an experience, that is my face everyday and you should have a solution for. Let’s just start the conversation there, because I think that’s one of the first mass endpoints that we can get some pull going in the community for making things a little bit more explainable. Yeah.

Tim Gasper:
Yeah, I think this is an interesting topic. I think at some point, we should set up a follow up, or all we do is just talk about AI, explainable AI because there’s so much interesting stuff to unpack there.

Denise Gosnell:
Absolutely.

Tim Gasper:
Just to take us in a slightly different direction, as we think about all these different open source tools in this ecosystem, there’s a lot out there. There’s so much to really absorb as a developer that wants to either contribute to these things or wants to use these different things, as you’re thinking about this community and all these things that are out there, and even from your own experiences working with these tools, how do you start? How do you learn? Where do you learn?

Denise Gosnell:
Well, how to start is a very simple answer from my perspective and it’s to start with some empathy. To start with empathy for the developers for any person around this massive loop of data and it’s important to start with empathy, because we’re all showing up with really hard problems to solve and other things going on, and that’s just going to create more productive conversations for solving these problems. So that’s where I would start, Tim. So what was the second part of your question?

Tim Gasper:
How do you learn? How do you really build your skills and your focus in this area?

Denise Gosnell:
Yeah, absolutely. So, this is coming from bias perspective because it’s one of the ways I like to learn. If you want to get involved in any area here, pick a data problem to solve and really diving in end to end to solve that problem is going to be one of the best ways to get experience and to understand very deeply how certain tools are useful in solving problems or not, in solving certain problems. So, when you pick up a certain business initiative, and you go to solve it end to end, you really, really become deeply familiar with all these topics we’re talking about when it comes to data provenance and tracking your data through the system. What Ron was mentioning for having a really solid solution for tracking metadata. So the first place to start is to solve a real problem.

Denise Gosnell:
My second piece of advice on top of that, is to pick a problem to solve or to align yourself with the problem to solve that has business value. Too many times, I’ve seen really brilliant, just super kick ass cool solutions to data problems by some of the most unique data scientists, never see the light of day. It was primarily because they weren’t going after a problem that was relevant today or in the near future. So all of their work kind of just got shelved and no one likes that feeling, kind of going back to the idea of empathy. So Tim, those are my two piece of advice. If you want to get started, first, pick a problem, solve it end to end with a team so you are deeply familiar with what that’s like. Two, make sure that problem that you’re working on is aligned to an area of business that is really going to be furthering whichever business initiative that you’re working on.

Tim Gasper:
You’re saying don’t just pick a random tool because you heard a lot about it and it sounds really exciting, and you want to build a playground of all these cool things and how they can connect together. You’re saying we shouldn’t do that.

Denise Gosnell:
I mean, to start, it’s like, this is day zero and day one, maybe not.

Tim Gasper:
I love that. One of the recurring themes here is start with use cases, so this is more confirmation of that.

Juan Sequeda:
I think this is great advice because if you’re … so many people get frustrated, because I’m doing all this stuff and it’s cool, but it doesn’t make an impact. I mean, if that’s your frustration right there, then hey, you got to be honest about it, is that you have to go understand where is that business? What is the business problem and understand how the business needs it and that’s your impact. The impact is going to be towards the business. Otherwise, you’re just going to be like, “Okay, here’s this cool researchy product. Here’s the next thing. Squirrel, here’s the next thing. Squirrel,” and then you’re just going off and doing squirrel, squirrel, squirrel and like, okay, you learn a bunch of things but what’s the impact you’re going to do?

Juan Sequeda:
Maybe you learn and that’s fine, but then people want … I mean, at the end of the day, you want to go do something, you say, “Hey, I helped other people.” I mean? Let it be, I’m hoping the company make more money or I’m working on a nonprofit approach and doing something else. So I think we really need to go tie it to the end, what is the final need? I think this is really important, that we forget about it and we just focus again on, “This is cool tech.”

Denise Gosnell:
Absolutely and I mean, imagine that feeling of the first time you had code, that was getting run in production. It’s like the coolest thing ever. It’s like, “Oh, I did that and you start from there and you help empower the next generation of engineers and brilliant minds, to be contributing to problems where you can lead them to that moment. It’s going to make so many other aspects of this conversation a lot easier.

Juan Sequeda:
I remember when … I’ll be very honest, I have not contributed to open source. Yeah, I’ll be honest about that.

Tim Gasper:
Shame on you.

Juan Sequeda:
I don’t know, I started contributing to code is to like my own code base, stuff that go with the company. I was focusing on, well, I’m building a company and to code it. For me, it’s like, “Yeah, my code is in production,” but then I’m not going to go sleep because hopefully that code is working, because people actually go use it and I don’t want people to be calling tomorrow at 7:00 in the morning and why do something at break? So it’s fascinating but at the same time, it doesn’t let me sleep.

Denise Gosnell:
Well, that’s the beautiful pitch for more observable data systems, by the way.

Juan Sequeda:
I love this, so great. So we got to focus more on transparent observability, so we can actually sleep comfortably at night.

Denise Gosnell:
Absolutely. No one loves to be on PagerDuty.

Juan Sequeda:
Yeah, this reminds me of one of our episodes with Chris Berg about Data Ops. We were talking like, why do Data Ops … So I can actually have a weekend and not worry about work, right? That type of stuff.

Denise Gosnell:
Absolutely. Absolutely. Yeah, whoever ends up on that rotation to have to monitor systems, just again, apply a lot of empathy because no one really wants to be dedicating their midnight to 6 AM hours to that kind of work.

Juan Sequeda:
I love it. Empathy, empathy.

Denise Gosnell:
Yup.

Juan Sequeda:
So, let’s go kind of change a little bit of topics here because one of the things that we’re both passionate about is graphs.

Denise Gosnell:
Absolutely.

Juan Sequeda:
We’ve got to talk about graphs for one thing. One thing I had noted here is, we both love graphs and everything, but there’s this other thing that we hear a lot is GraphQL.

Denise Gosnell:
Yeah.

Juan Sequeda:
So one of the annoying things that I … something that really annoys me is this thing called GraphQL, that has nothing to do about graphs. So where are we seeing kind of GraphQL now, within the whole kind of open source and data community?

Denise Gosnell:
Yeah and I’m going to start with what is the difference?

Juan Sequeda:
Please, please.

Denise Gosnell:
Yeah. We’ve got some folks listening and we’re saying graph and now we’re just adding a QL on the end and it’s different. Right, so I think up until now, when you’ve heard Juan and I talking about graphs, we have been talking about a specific way to look at data, a specific data structure. It’s all about the relationships in the data, just like it’s all about relationships between people. That’s a graph. When we’re talking about GraphQL, we are talking about a way that developers can query databases from their apps. Two totally different ideas. We’ve got a way data is structured with relationships. That’s graphs. GraphQL is a way to query data from an app. So that’s … just to kind of level set I want to start there. Juan-

Juan Sequeda:
Which could be in the graph for, it’s probably whatever other type of storage.

Denise Gosnell:
Yeah GraphQL works with all types of storage, layers in the bottom. So it could be graph but not always. Actually, most often not.

Juan Sequeda:
Not a graph. Exactly, which is a weird thing. So where are we going with graph, where is the open source community going off with GraphQL because you see it all over the place. Is this taking over the world or not and is it just passing through or-

Denise Gosnell:
I think it’s a safe bet that it is absolutely going to take over the world. I mean, for lack of a better analogy, it’s going to be the SQL of the next gen of how you work with apps and work with data in your apps. Absolutely. If that hasn’t been said before we should TM that because I think that’s a really good way to explain it. It’s the next SQL of working with data.

Tim Gasper:
That’s a big deal.

Juan Sequeda:
That’s a big statement.

Denise Gosnell:
Yeah, I would think so, and where it’s going in the open source world, is one of my most favorite way, your inventions or I don’t know what the right word is, but GraphQL recently came out with data federation, which I’m pretty stoked about because now, it’s starting to bleed into doing neighborhoody graph things with your data. So now GraphQL is starting to try and get really close to graph structured data functionality, but where GraphQL was going with federation is something I’m pretty excited about. What you’re able to do with federation is you can write your GraphQL statement. So you’re an app developer, you’re using GraphQL to query your databases and with federation, you can query your customers and then your products in separate databases. One could be a SQL database, one could be a document DB. As long as your resolvers are working, you can just do all of this.

Denise Gosnell:
So you can query different databases and then federate that information back together, because it was related in some way, like a customer made a purchase of this product, so it’s very interesting to see how GraphQL and data federation specifically is pushing the bounds of what the GraphQL query language can do up to graphy shaped problems. So I’m very much looking forward to seeing that continue to evolve. Have you worked with data Federation yet, Juan or Tim?

Juan Sequeda:
Yeah, we do. We don’t want to get salesy in here but-

Tim Gasper:
Yeah, in the context of a hard platform but not in the context of GraphQL unfortunately.

Juan Sequeda:
Again, I love how we’re just going off on different topics and this is cool. I think for the Federation for me, comes in combination with also data virtualization, right? So you want to kind of keep your data where it is, that’s why we want to go virtualize and federate, you want to go combine queries about this. They come from different sources. I see like the use cases around this are more kind of from the development perspective, because you have physics, you got to move data across the wire, right? For some use cases it’s going to be fine. For privacy reasons, maybe I need to keep the data as it is and yes, you got to wait extra seconds, and that’s fine. That’s the price we’re going to pay for privacy.

Juan Sequeda:
Maybe for development, that’s great because I can quickly see how I’m going to go integrate data, but A, I won’t be able to keep my SLAs. I’m going to have to … after I figure it out, I’m going to go materialize it, but I think that’s how I’m going to see federation and virtualization is more for like quick and dirty things, for development and then, depending on some of the use cases, you can move that to production, but it’s probably just more for like developments and see how it’s going on. That’s my perspective. I don’t know. What do you think?

Denise Gosnell:
I mean, yes. I mean, that’s where federation is right now, today is … and kind of in more development, let’s see how to use it. So yeah, Juan want there’s not-

Juan Sequeda:
Where are the open source … Can I get back to open source? What are the open source federation tools systems?

Denise Gosnell:
Yeah, so I don’t want to get salesy either. The only open source Federation tool I know is one that I contribute to through my work here at DataStax and that would be Stargate, but I would love to know more if other folks out there know of other data federation open source tools. I know Apollo is working on some, of course. Just to answer a question that someone that just came in on LinkedIn, it says wait, GraphQL isn’t just another graph query language, and that is correct. GraphQL is not another graph database query language yet?

Tim Gasper:
Just on that topic of virtualization and federation really quick, is would you all consider things like Presto and … what else is out there, Drill and those types of things are those federation? I think they’re definitely query engines. I guess it’s all kind of a blur now. It’s all kind of-

Juan Sequeda:
Yeah, federation, you have distributed queries, right? Some Presto distributed SQL query engine. How do we all see this?

Denise Gosnell:
What does it mean to be a distributed SQL query engine? What is that?

Juan Sequeda:
Is that you’re federating queries?

Tim Gasper:
Virtualization and federation. I don’t know. Maybe it’s all the same stuff. Just different word choices.

Denise Gosnell:
Okay, it sounds like this is the next topic of emerging cool tools where … we would like some more clarity. So we should put a bookmark on this one. Yeah.

Tim Gasper:
I like that.

Juan Sequeda:
Give me a little bit of license here on the salesy part, but because you’re open source, I’d love to hear more about Stargate, because I’m not familiar that much with it.

Denise Gosnell:
Yeah. Stargate, just think of it as like an API platform, an open source API platform that gives you the ability to work with your data, whatever endpoint way that you would like, REST, GraphQL, gRPC, you name it. It’s an open source API platform that gives you any style of endpoint you want.

Juan Sequeda:
So, it uses GraphQL too?

Denise Gosnell:
It does. Yup. Yup. I mean, right now being that this is something that is coming in our DataStax, the main supporting back end is Cassandra, no surprise. Another open source database coming from DataStax as well. I don’t know, it’s really changed the way that developers are accessing their data that needs to be incredibly, highly available at the Cassandra level and it’s really interesting to see how people are wanting to use data in Cassandra in a different way. The idea of just putting a GraphQL endpoint on top of Cassandra tables, it was kind of like, “Oh, let’s see what people do with that.” It’s been pretty interesting and you can federate data there too, but we’re getting way too salesy even for my liking, so let’s bring it back, let’s bring it back, let’s bring it back.

Tim Gasper:
So I have a question now on GraphQL, all right?

Denise Gosnell:
Okay.

Tim Gasper:
Maybe to take it back a little bit here. So, I’ve noticed that a lot of companies are starting to add GraphQL endpoints to things. For example, I know that like Monte Carlo data has like a GraphQL endpoint in sort of the observability side. Tableau has a GraphQL endpoint. These things are kind of about more on the sort of the API side in terms of accessing the data and being able to do it in a way that reflects and respects the relationships, more on things like that. So that’s interesting, right? I’m curious, Denise, if you have a perspective on this or Juan, you end up having more of the perspective on this, it’s like, so you’ve got GraphQL and you’ve got like graph databases and graph query engines and things like that. Are these things converging in the future? Are they kind of just parallel things to each other, they just both happen to have the word graph in their name?

Denise Gosnell:
I’ve got tons of opinions, Juan though if-

Juan Sequeda:
No, well, what’s going through my head right now is that given the whole popularity of GraphQL and what Denise is just telling us that they’re kind of going and doing more graphy things. I’m actually hoping that people are going to get interested more in graph databases because of what GraphQL is trying to go do. This is literally going through my head right now in the last 10 minutes because I don’t even know that GraphQL is doing more of these graphic things. So Denise, I’ll throw it back to you.

Denise Gosnell:
Yeah, I’m really excited about where GraphQL is going and I’m closely following it and trying to contribute where I can. Right, so Tim, to your question, yes, they are converging. The first place they’re converging right now is obviously by the shared name, Facebook’s original reason for inventing GraphQL was to do, what the graph nerds like Juan and I call neighborhood queries, which is basically like for me or something I care about, tell me everything that’s immediately connected to me. So all of my first relationships, my immediate friends and the GraphQL query language, it does solve those problems, where it’s going to be able to hit one table and then query one layer out. So that’s where GraphQL has kind of started.

Denise Gosnell:
So with the name graph, and the ability to do the one neighborhood out types of queries, that’s in the intersection between graph databases, graph query languages and GraphQL today. When they brought in federation, it was giving you the ability to do that same behavior for this really cool thing, tell me everything connected to it, but look outside my database and that’s where federation comes into play. Now, maybe Juan and I and some other graph folks can put our heads together, because I would love to see this simple language, the way that you use the language. I would love to see it extend into actual graph style queries, like paths and hierarchies but the way that the language is specified now, it’s not possible to do that.

Denise Gosnell:
When you start to look at how that would work technically, things get really hairy really fast, but we cannot deny that developers are pulling us in the direction of query languages that are easier to use, easier to understand and just work within their existing app stacks. So, Tim, I hope that answers your question. I could go on for way too long about GraphQL. I’m pretty-

Juan Sequeda:
No, I think that was actually very helpful because obviously, you’ve got like Sparkle and you’ve got like GQL workstream, and things like that, that are happening on this side and then, you’ve got GraphQL, and it just seems like … I mean, things will go the way they go, right? I mean, that’s how this sort of space works, right? As things get mature and as more people use them, they expand but it just seems like, GraphQL queries could be running on the graph back end, and if you want the sparkle to be running, and it to be connected to something that an API is doing, I mean, it just seems like in general like, I just love the graph related stuff is continuing to get more and more adoption regardless of how it’s doing. So I mean, that’s awesome.

Denise Gosnell:
Absolutely. Tim or Juan, did you guys watch the Olympics and notice, Salesforce’s new commercial?

Juan Sequeda:
I did not.

Tim Gasper:
No.

Denise Gosnell:
Yeah, so all over NBC during the Olympics was this Salesforce commercial or maybe it’s my version of the Olympics because they’re giving me targeted ads, because I’m in fact, I don’t know-

Juan Sequeda:
Because you’re clicking on all the depth on all the cookies, right or something like that.

Denise Gosnell:
Yes, but what Salesforce is doing, I’m really excited about this new product that they started advertising during the Olympics. I’m a big sports nerd, so that’s where this part is coming into my life, but Salesforce has this new thing called Customer 360 for your sales opportunity, and then they bring in all the connected information about it, like who works with that company, et cetera, all into one tool. I mean, plug out to Salesforce, I guess because I was like, “Yes, that is the very beginning of a graph problem to be able to bring in information and connect all the important entities together,” and major hats off if they are using GraphQL data federation to do it, so let us know.

Juan Sequeda:
One of the things he just mentioned is that GraphQL, I think is a … developers kind of enjoy, because kind of easy … has a good developer experience. What’s your position about the developer experience of Graph Query Languages then?

Denise Gosnell:
I hope I’ve said enough. Let’s just say the fact that I had to write a 400 page book to explain how to use graph query languages, to solve graph problems is 300 pages too long.

Tim Gasper:
I wanted to get there.

Denise Gosnell:
Look, I mean, my heart is always going to go out to connected data. I don’t know what it is about me as a person. I can’t change it. I love it, and I also am probably one of like the two individuals on the planet who likes Gremlin, the graph query language. I’m okay being in that bucket too. I had to go through the incredibly steep learning curve to figure out how to use Gremlin and I am beyond honored that I got to pour my heart and brains on the 400 pages to try and teach other people. I just don’t think that … that’s a lot, that’s a lot of information to try and pick up a new tool that is very, very needed if you’re going to have more explainable models in the future namely, graph databases are absolutely a part of the explainable AI movement, because it helps it make much more transparent sense as to why things are the way they are in a model.

Denise Gosnell:
Back to your question about graph query languages, they’re really hard and we haven’t converged on a good solution yet and as someone squarely in the Gremlin camp, I’m more than happy to admit that. I love Gremlin forever, but I just don’t think it’s the answer for wide adoption of graph database query languages. I don’t know Juan, what do you think, you’re-

Juan Sequeda:
So I’ll be honest, I think Gremlin is such a complicated language. I come from the RDF graph camp and I will also acknowledge that Sparkle is very verbose and annoying with all those URIs and all those things. They’re there for a reason. I mean, honestly, hats off to the Neo4J folks and Cypher because they really designed the syntax of a language to be that … the ASCII art stuff and I think this makes it cute and kind of easier to understand. I think, yeah, hats off to them for that and I know that they’re the ones who are kind of influencing a lot of this new GQL standard and I’m glad it’s more of the Cypher and it’s not kind of the Gremlin or Sparkle or any of that stuff. I mean, Sparkle is the standard itself but yeah. I think there’s a lot of … we need to go do more research or work about developer empathy.

Juan Sequeda:
It’s like, “Hey, why are we creating this thing? How do we make whose life better?” I think if you look at it, like GraphQL was made in a way because somebody had a very specific use case. This was super specific, I want to know everything around this one thing here. So how about we create a language or a new infrastructure, whatever to go do accomplish that problem. People are going to complain, “Oh, but you can’t do this and this,” and this is like, “Yeah, we can’t because that’s not the problem we’re solving, but you’re thinking about the problems already.” That’s fine. I’m just thinking about the potholes in front of me and not like all the stuff that’s in the sky, and maybe later on, they’re going to run into issues but at the end, I think at some point, we start figuring how to go converge.

Juan Sequeda:
So, yeah, think hats off also the GraphQL, because people are really … I see that adoption and it’s solving problems and people are being more productive around that.

Denise Gosnell:
Absolutely. Yeah and I love that you brought up Cypher, because one anecdote is that the first time I realized I wrote something close to a Cypher statement is when I was trying to describe the relationships in my data and Slack to somebody else and I was writing out the ASCII art of how things are related. I was like, “Oh, wait, that’s pretty much as a Cypher query, so hats off of course, if you’re designing the …” since their design is very intuitive, so people are thinking those.

Tim Gasper:
Well, you’re thinking it very quickly translated into almost what the query would have been.

Denise Gosnell:
Yeah.

Tim Gasper:
So, we’ve passed the 10 minute warning mark here.

Denise Gosnell:
Okay.

Tim Gasper:
I think actually, what might make sense is for us to switch into the lightning round, where we ask some quick yes or no questions and walk you through. So Juan, you want to start us off?

Juan Sequeda:
Yeah. The question I already have is not a good one, because I already know what the answer is.

Denise Gosnell:
Maybe everyone else might not know, so we could still do it.

Juan Sequeda:
All right. Well, will GraphQL take over the world of query languages?

Denise Gosnell:
Yes.

Juan Sequeda:
That was the obvious answer, right?

Tim Gasper:
For those of you that are just tuning in, check out the recording about 30 minutes earlier. All right, I’ll do the next question here. So we ended up not talking a bunch about personas today but are data engineers, the new data scientist?

Denise Gosnell:
Yes. I want to change my answer, but I’m going to go with yes.

Juan Sequeda:
No, no, no. I like this, I like this, I like this.

Tim Gasper:
First instinct but then a second guess.

Denise Gosnell:
Yeah.

Juan Sequeda:
All right. Next question.

Denise Gosnell:
The context of my first instinct was like in-demand job, right? I feel like five years ago, data scientists were super in-demand and now that demand is shifting to data engineering. They do two different things, though.

Juan Sequeda:
All right. Next question. Metadata related tooling is going to be the fastest growing segment in open source in the next five years.

Denise Gosnell:
No. No.

Juan Sequeda:
Wait, after all that they talk, that we’ve had about transparency, you’re saying no.

Denise Gosnell:
I am saying no because I don’t think that it’s going to start with metadata.

Juan Sequeda:
How’s it going to start … How is the transparency part going to start then?

Denise Gosnell:
I think that the transparency is going to start with systems to begin with. It’s going to start with information movement through systems. Do you define that as metadata?

Juan Sequeda:
Yes. I would.

Denise Gosnell:
Well, then yes. We need a definition on what is meant by metadata.

Juan Sequeda:
My god, data about data, right?

Tim Gasper:
Semantics are important here. That’s interesting. I mean, that brings up an interesting conversation point. All right, last question here. Will open source data tool adoption grow faster than proprietary data tool adoption in the next five years?

Denise Gosnell:
I would love for it to, but I’m going to say no because of security and compliance reasons.

Juan Sequeda:
That was-

Tim Gasper:
Interesting answer. I like it.

Denise Gosnell:
Your entire opening pitch was just debated at the very end in the lightning round, or not debated, it was negated at the end of the lightning round.

Juan Sequeda:
All right. Well, one final comment here I see from Rodney which I know Rodney … just shout out to Rodney. He’s always been a live listener here. So great to see you here. Well, quote, unquote, see you. If Cypher could only do federated query and I completely agree with that, so if any Neo4J folks are listening, which they hopefully are, because last week, we actually had EML on the podcast. Yeah, Neo4J folks, go get federated queries. Everybody is doing this stuff. Sparkle supports it. Look, GraphQL is now doing it. You guys better do it, come on, come on. You guys are lagging.

Tim Gasper:
Should we sent him a text message, is this on your roadmap?

Juan Sequeda:
I’m going to text EML right now saying, hey … All right, well, Tim, take us away with takeaways, Tim?

Tim Gasper:
Absolutely. So I mean, this is such a great conversation and I’ve got so many good notes here. I think one of my biggest takeaways was around … when we were talking about developers and how they can leverage open source better and how they can learn and how they should put it to use. You talked about focusing on a real problem and really focusing on use cases is the center point about where you’re applying this technology, which tools you’re grabbing for. I mean, it’s so common sense but yet so often we see in the industry, people not approaching it that way and it’s the wrong way if you want to learn, and if you want to be impactful and if you want that feeling of having impact.

Tim Gasper:
So I love that and I love this idea of developer empathy as being the center of the learning of the adoption of sort of the participation in the open source community, how tools can help. I think that’s a great term and it deserves to be on a T-shirt, and everybody should be wearing that. What about you Juan?

Juan Sequeda:
My main takeaway is transparency. So how we all start this conversation, to be honest, no BS is of what’s next with open source and data is let’s go be transparent about it. In transparency across the entire spectrum, right? How you’re consuming the data and where does that come from, to the way like, how is the data going through systems and ups and downs, and down times and all that stuff? I understand how your recommendation algorithms work, right? That black box needs to be able to understand what that is. The kind of an aha moment I’m having right now is to be able to enable that transparency. It’s metadata, but it’s really this entire ecosystem of metadata and one of the comments on the chat was not another metadata system but a metadata ecosystem where I can place and replace the tools I want, plug, play and unplug.

Juan Sequeda:
I think that for me is like the main nugget out of this conversation. I think that’s where we need to go head towards. Denise, I’m going to throw it back to you, two final questions, what’s your advice about data, about life, whatever and who should we invite next?

Denise Gosnell:
Yeah, so my advice about life is take a sabbatical. Well, in all honesty, the take a sabbatical comment is coming from the theme that Tim was just recapping with developer empathy. If you want to be able to have empathy for other people, first start with understanding what empathy looks like for yourself. I think a lot of people forget to make sure that they are capable of monitoring their own ups and downs sometimes and it can be really hard to be empathetic for others when they’re going through the myriad of open source software tools that are out there and how to use them, when you don’t fully understand what that looks like. So that’s my advice and who I would invite. Hold in. I’m going to make sure that I send a tweet out to him. We got to get holding on here.

Juan Sequeda:
Awesome. Denise, this was a fantastic conversation. Thank you so much for your time. I’m looking forward to sharing this with so many people. This is great.

Denise Gosnell:
Yeah, this is quite entertaining. Thank you guys so much for having me and letting me come over with just my sparkling water beverage.

Juan Sequeda:
And I got mine over here.

Tim Gasper:
Absolutely. Beverage is welcome.

Juan Sequeda:
Next week is Kirk Borne, I think we-

Denise Gosnell:
I love Kirk.

Juan Sequeda:
Kirk is just an amazing guy to just go listen to and we’re going to be-

Tim Gasper:
We’re going to try to demystify and debunk some buzzwords in the data space, right?

Denise Gosnell:
Absolutely. Put metadata on that list because apparently-

Juan Sequeda:
There we go. We got some more. Yeah, definitely. We got like, I think industry 4.0 is what we want to go talk about. We got a couple of that. It’s going to be fun conversation.

Denise Gosnell:
Awesome. Awesome.

Juan Sequeda:
Happy Wednesday. Thanks Denise.

Denise Gosnell:
Happy Wednesday, thanks you all.

Tim Gasper:
Cheers.

Denise Gosnell:
Cheers.

Enter Content Here.