About this episode

The data landscape has evolved substantially over the last decade. We’ve gone from data lakes to data hubs to lake houses. We see data represented as documents, columns, graphs, and time series. So how might this evolution continue over the next ten years?

To help us ponder this question, Juan and Tim are bringing in Emil Eifrem, CEO of Neo4j. We’ll take a look at how we got to this point, and what new data management challenges and opportunities await.

Special Guests:

Emil Eifrem

Emil Eifrem

CEO and Co-founder, Neo4j

This episode features
  • A look at database categories that may spring up over the next decade
  • Positives and negatives of database standardization
  • What’s changed the most about the hosts/guest over the past 10 years?
Key takeaways
  • There are very few domains which are inherently table oriented
  • Data Lake → data scientist → code centric
  • Cloud DW → data analyst. → sql centric

Episode Transcript

Tim Gasper:
It’s time once again for Catalog and Cocktails. This is your weekly live honest, no BS, non-salesy conversation about enterprise data management with tasty beverages in hand. I’m Tim Gasper, long-time data nerd and product guy at data.world joined by Juan.

Juan Sequeda:
Hey Tim. I’m Juan Sequada, principal scientist at data.world. And it’s Wednesday, middle of the week, time to take a little break and have a drink and chat about data in an honest no BS way. And today, we have an awesome guest because all our guests are awesome [crosstalk 00:00:36] I’m really, really excited because we’re going to go talk about so much stuff about data and graphs and who else would be the one of the best people in the world to go talk about this than Emil Eifrem, the CEO of Neo4j. Emil, how are you doing?

Tim Gasper:
Welcome.

Emil Eifrem:
I am glorious. Thanks, guy. Looking forward to this.

Juan Sequeda:
All right. Awesome. Awesome. So, if you don’t know who Emil is, you’ve probably been living underneath the rock. So, I think you’re a man who does not need any introductions. But let’s just go start off with our discussion here about telling and toasting. Tell us what we’re drinking and what are we toasting for. Emil, you want to start off?

Emil Eifrem:
Yeah. So, I’m drinking Jämtlands IPA which is my favorite Swedish for those of you who can actually see the visual. I’m basically sitting what looks like a sauna right now. It’s midnight here in Sweden, an hour before midnight. And it looks like a sauna. I’m here in Sweden. And this is my favorite local micro-brewery, Jämtlands Brewery which is it’s a region in the northern part of Sweden. I’m drinking today their IPA.

Juan Sequeda:
Awesome. Well, I look forward to going to Sweden one day. And then hopefully, we’ll meet up and have beers together over there.

Tim Gasper:
And thank you for joining us so late. It feels like a nice sort of evening ambiance you’ve got going on.

Emil Eifrem:
I think it’s perfect because I can legitimately drink this without being too decadent. It’s 11:00 PM, whereas, I guess, I think when you recorded with DJ, the guest from last week, it was probably morning for him or something like that.

Juan Sequeda:
Hey, it’s always 5:00 PM somewhere, right?

Tim Gasper:
It’s hard on the West Coasters when they got a drink at two o’clock in the afternoon, right?

Emil Eifrem:
Exactly.

Juan Sequeda:
How about you, Tim? What are you drinking today?

Tim Gasper:
I am drinking a Vesper Martini. I’m running a little low on some of my liquors. I got some gin. I had some vodka and some lemon and a little bit of Lillet Blanc. So, that’s what I’m drinking today and a wine glass because I have no martini glasses. I got to upgrade my game a little bit.

Juan Sequeda:
Well, I’m having a vodka soda, but not any type of vodka. It’s one of my favorite vodkas that you get in Mexico. So, it’s a Smirnoff spicy tamarind vodka with grape fruit soda. And that’s what I’m having here.

Tim Gasper:
Spicy [crosstalk 00:03:02].

Juan Sequeda:
So, cheers. I’m going to kick off toasting because I finally launched my book I’ve been working on for years. And it is now out there. You can go get the book Designing and Building Enterprise Knowledge Graphs together by co-author, Ora Lassila from Amazon. So, I’m cheering to that year is finally in the making and done. Cheers.

Tim Gasper:
Awesome.

Emil Eifrem:
Cheers.

Tim Gasper:
Congrats, Juan.

Emil Eifrem:
Amazing. Well done.

Juan Sequeda:
So, we got our warm-up question here which is what has changed the most about yourself over the past 10 years? Who wants to start? You’re on the spot here.

Emil Eifrem:
That’s hard one. How about you go first because that one was hard.

Tim Gasper:
All right. Yeah. When I first saw this question about 10 minutes ago, I was like, “Oh gosh. What am I going to say?” I decided that I’m going to say, “My forehead wrinkles.” They’ve deepened quite nicely over the last 10 years. It really gives me a weathered kind of distinguished look, I would say. And you know what I’ve noticed over the last 10 years? I like whiskey way more than vodka now. That has evolved quite a bit for me.

Juan Sequeda:
How about you, Emil? You have one now?

Emil Eifrem:
Oh, man. Look, I mean 10 years, 10 years ago, I just started Neo4j. And I was single. And now, I’m married. I have three kids, I guess, four kids including the company, Neo4j. What has changed the most? I feel like freaking everything has changed except for my love for graphs that one still remains. But probably my ability to go for days, if not weeks, if not months at end with very little sleep.

Emil Eifrem:
I used to be one of those people who I didn’t have to sleep a lot. But I think probably around the time when I crossed into my 40s, now, I need a solid six, seven hours per night. Otherwise, I fall over.

Juan Sequeda:
I’m going to steal a couple of years. The sleep thing is the same. 10 years ago, I was in the middle of my PhD. And I was like, “Ah you sleep four or five hours, and that’s it. Now, sleep eight hours, like minimum eight hours. And that has changed my life. And so, life recommendation is go sleep eight hours. It’s a good thing. And I think also and my appreciation for wine has dramatically increased in the last 10 years. So, that’s me. Hey, so, let’s go kick this off. Emil, honest, no BS question. This whole data landscape, it’s bonkers. It’s freaking crazy. In your perspective because you’ve been looking at this for over a decade now, how did we get here? And where are we going? Let’s go start with this, and we got more things to go dive in.

Emil Eifrem:
Yeah. That is one broad question. But maybe let’s split it into how did we get here. And then later on, we do it like where is it going. So, I guess at the high level, I kind of joined this world as a vendor as it were at the cusp of NoSQL. Neo4j was founded back in 2007. We had a couple of years. And then NoSQL happened summer of ’09. And prior to that, I was a user of data products. But not a producer of them as it were.

Emil Eifrem:
And I think kind of walking into the previous decade, there was this explosion of experimentation. This was on the back of. I feel like it got kicked off by Amazon’s Dynamo paper. And then, just a few months after that, Google’s big table paper. And just the observation that the big web giants, they didn’t run on the relational database. And so, that created this massive divergence. There’s a site called DB-Engines that I’m sure many people who listen to this podcast would be familiar with. But they track a bunch of signals around database projects like tweets and stack over questions and Google searches and stuff like that.

Emil Eifrem:
And they’re now tracking over 350 databases. And when I grew up as a professional developer in the mid-90s, there’s a handful to choose from, Sybase, Informix, Oracle, DB2, whatever, MySQL.

Juan Sequeda:
Don’t forget [crosstalk 00:07:43].

Emil Eifrem:
Yeah. That’s right. And, honestly, there’s like 98.7% of the same product. And that’s a scientific statement. It’s like basically the same product. And then on the margin, there’s a little bit. So, it was basically a vendor choice. And then fast forward to kind of the early parts of the previous decade, all of a sudden, it’s like key value store this, and document database that, and graph database this and time series database. There’s just explosion of choice. And I think there’s a number of reasons why that happened. First of all, I think that the only time when you can take a new database and bring it to the market is when you have a platform shift of a kind. If we think back to the shift from mainframe to client server, that was really what built Oracle and what enabled Oracle to be built in the relational database.

Emil Eifrem:
And then, we had kind of client server to web. And that’s where MySQL was born. And if you look at kind of where we are today, I think there’s two broad platform shifts, the shift to the cloud and then the shift of mobile. And mobile hasn’t really given birth to a new database. Couchbase tried for a while. Realm tried until they got acquired by Mongo. But the real isn’t like a new kind of this is the database for the mobile platform.

Emil Eifrem:
But, of course, the cloud shift has enabled a bunch of them. So, I think that’s a big driver. And maybe, let me pause this [inaudible 00:09:14] model. But that’s one. I think there’s three, four other key drivers that caused this explosion of choice in the early part of the previous decade. But let me pause there and see if you guys even agree this far. Also, pause to drink some of this delicious beer.

Juan Sequeda:
Well, I think it’s an interesting statement. And I agree with is that at the beginning, it was just always these five six databases. And that’s it. I think remember the LAMP stack, Linux Apache, MySQL PPH. I mean that’s how everybody got involved… That was a simple easy free way to go get involved with databases. And that was it. And then one day, I’m growing up. I get out of MySQL. And I use either SQL server or Oracle. And that was it. I mean even DB2. You use DB2 because IBM made you go to it. So, that’s a really good observation. It was just a vendors. You just had this one thing and all these vendors sell. They sold the exact same hammer, just happened to have this different label to it.

Juan Sequeda:
But the kind of the whole NoSQL thing is people started getting kind of annoyed with SQL. First of all, it was NoSQL. And then, people like, “Well, it’s not only SQL.” And then, kind of the whole what’s so ironic is that now we put SQL layers on top of the NoSQL. Now, everything has a JDBC driver and stuff like that. So, at some point and a lot of people I do talk to is like, “Well, well SQL’s still going to go prevail. It’s not going to go anywhere.”

Juan Sequeda:
But then, we have 350 databases. So, that’s not true or not. I mean how the heck do we get to 350 different databases? And what are those categories? I mean this is ridiculous. Do we really need all those different types of databases?

Tim Gasper:
Has the specialization been a good thing?

Emil Eifrem:
It’s a good question. And so, how did we get here? And then is specialization [inaudible 00:11:08]. I think there’s probably three components that I think of. One is kind of an enabling force. And I think the shift to the cloud as one of them, that introduced new architecture patterns, things like microservices and containerization which shifted us more away from this big monolith where if you wanted to for whatever reason, if you wanted to switch out your database, if you have a massive monolithic type architecture, it’s more costly. It’s harder.

Emil Eifrem:
So, the value has to be bigger in order to even switch out your database. So, that’s all kind of on the enabling side, as I think of it. And then, I think of like there’s a pressure side or maybe there’s, I sometimes think of a supply side. And then, there’s a demand side or a value side.

Emil Eifrem:
On the pressure side then, it’s just… I mean I kind of hate the term big data. I always hated it. But just the proliferation of like massive amounts of data, and it’s driven by all these sensors that we carry around in our pockets. So, just more and more data that is more and more complex that exerts huge amount of pressure against the existing model which, at the time, was just the relational database. So, think of that kind of on the pressure side. So, that’s the second bucket of forces. The first one was the enabling one.

Emil Eifrem:
And then, the third bucket of forces, I think, is on the value side or the demand side. And here’s where things like AI and machine learning come in which is, of course, like this massive secular shift which goes even beyond technology. And that’s it. That’s a broader societal shift. And that I think speaks to the value of using different types of datastores.

Emil Eifrem:
So, I think those are the three driving forces and a couple of examples in each of them. I do think the specialization served us as an industry well and maybe more so the experimentation because when you have a chance to get orders of magnitude betterness, for some definition of betterness like more agile software development, developer productivity, or performance, or scalability, it just unlocks new things.

Emil Eifrem:
And if you can get that for your application or a slice of your application, coming back to the microservices piece, it doesn’t have to be for the entire application. But if the slice of your application can be orders of magnitude better, all of a sudden, you can leapfrog the competition, create much more value for your users. So, I think that was just really valuable for the industry at large.

Tim Gasper:
It’s like as you know as you’re stating, I don’t love the term either. But as big data happened as unstructured data and in its various forms, fast data streaming data, geo data. I mean you think of sort of the specialization of data and the quantity and the complexity, and all these things. It’s like this environment made specialization become really beneficial because even though, obviously, SQL was this hammer that you could use to hit a lot of different things with, people were like, “Well, but it’s not really great at search.” And it’s not really great at working with JSON documents. And it’s not really great at dealing with events. It’s good for transactions.

Tim Gasper:
And so, you had this kind of explosion. And, obviously, we’ll talk about graph as well. I think that’s super interesting. And just as a reminder to our audience because I know that we’ve got a lot of folks listening right now, feel free to jump in with questions, and we’re starting to see them pop up on the screen here. So, definitely, feel free to add your questions live.

Emil Eifrem:
Yeah. And then, I think there’s this tension because in theory with what we just said, if a component of your application can become a 10X or a 1000X better, just let’s just use performance because it’s such a strong thing, a thousand times faster. Why wouldn’t you do that all over the place? Well there’s this tension in the real world because you don’t want to introduce 13 new data stores in your application because that’s just a management nightmare, a skill set nightmare. It’s just all kinds of problems.

Emil Eifrem:
And so, then, there’s this tension between or maybe using exactly one for the entirety of my company. That’s not the right thing to do. That’s kind of the world we lived in the ’90s and the early 2000s. But we’re not going to use each and every one of the 350 either. And I think as an industry, we’re still figuring that out. How many? Is it appropriate to use? And I also think that… Go ahead, Juan.

Juan Sequeda:
On that vein of which to use in the spectrum, this is really interesting. I want to go navigate the spectrum. The first reaction I have is, “Well, the two sides are OLTP and OLAP.” I got transactional data. And I need to go manage transactional data, manage analytical data. And then within each one, you have a bunch of more specializations. I would argue that for the type of transaction applications that we do, it’s always the relational, always will be kind of the winner there, I believe. I want to hear what you think about this.

Juan Sequeda:
On the analytics, I think that’s where things are going to see much… We’re seeing more special types of specializations. How do you see this space right now between OLTP and OLAP kind of before where we’re going on this?

Emil Eifrem:
So, I think success when I go to conferences and I participate in panels, I think panels are generally the lamest and most boring part of a conference unless there’s a disagreement on the panel. So, I’m going to carry that philosophy forward to a podcast and say, “I completely disagree with you, Juan.”

Juan Sequeda:
On what?

Emil Eifrem:
On both of those statements. So, I do agree with the split between kind of analytical and transactional. But I actually think we’re going to see a convergence. And maybe, this is a segue to speaking about kind of where the world is going. If we talked about kind of where we’ve been, okay, where do we believe it’s going? And at least the way that I think of it, I also think of it as this broader split between operational data stores.

Emil Eifrem:
So, these are used by developers to build applications. The applications use the data store. That’s an operational data store OLTP. And then, analytical data store which stores historical data. And it’s used by data scientists really to air quotes build machine learning, build AI or for data analysts to build reports really.

Emil Eifrem:
And if we start on right-hand side on the analytical data store side, it’s system of record for history. That’s kind of what they’re doing. And the question is what are you using that history for? And the way that I look at that landscape today, we’ve had for maybe two, three, maybe four years, we’ve had two product categories. One is let’s call it the data lake dynasty. So, this is really led by data bricks. And the primary user here is the data scientist. And it’s very code-centric.

Emil Eifrem:
So, that’s kind of one category. And then, the other category is let’s call it the cloud data warehouse. So, this is Teradata in the cloud maybe. And the primary user here is a data analyst rather than data scientist. And it’s more SQL-centric rather than code centric. So, those have been kind of the two paradigms. And the leader here then is Snowflake.

Emil Eifrem:
And, of course, all the global cloud platforms have implementations of this, for example, Redshift and other things like that. So, that’s kind of that part. And what’s very clear right now is that those two are converging into one. If you look at the features that Snowflake is launching right now, it is squarely targeted to the data scientist persona. And they’re very code-centric.

Emil Eifrem:
And some of that is core features in Snowflake itself. Some of that is deep integrations with the cloud platforms. But it’s very much targeted to the sweet spot of the databricks use case. And if you look at what databricks is doing with they call it the data lakehouse pattern, that’s adding schemas on top of the data lake which, all of a sudden you’re a world-class expert at this, Juan. The schema gives them an opportunity to build a query language, build a query planner.

Emil Eifrem:
And so, all of a sudden, they can add SQL on top of that which is squarely targeted to the Snowflake kind of sweet spot use case. So, I actually think they’re going to converge. And I think the stable state here is actually one category on the analytical data store side. So, let me pause there. I’d love to talk about the operational side as well. But that’s actually how I see that one kind of over time.

Tim Gasper:
So, the data warehouse is becoming the data ware lake. And the datalake is becoming the data lake house. And maybe, it’ll get a little wary in there as well. Those things are converging. It’s your structured data wants to be able to support more on structured data. Your more unstructured or code-centric approach wants to be able to handle these analytical applications, your tableaus, and your things like that. You want those to be able to work well with databricks too. So, those are kind of converging is what you’re saying.

Juan Sequeda:
Nothing controversial here. I think I completely agree with this. [crosstalk 00:21:09]

Emil Eifrem:
I tried to be disagreeable.

Juan Sequeda:
No. So, I think [crosstalk 00:21:13].

Tim Gasper:
[crosstalk 00:21:13] the transactional side, it sounds maybe there’s a little disagreement.

Juan Sequeda:
I think that’s where it’ll get interesting. But I think a very valuable kind of insight here is the data lake, the data bricks, they’re data scientists. They’re code centric. They’re cloud data warehouse or the analysts or the SQL-centric. I think the cloud data warehouse like Snowflake, they’re just taking your traditional old-school Oracles that people have been doing. And they’re like, “Put in the cloud, ” the storage [crosstalk 00:21:36].

Emil Eifrem:
Store data in the cloud.

Juan Sequeda:
Air quotes, putting in the cloud. I mean at the end of the day, putting things in the cloud is supposedly make things easier, better for the business. Yes. Fully agree with that fact. Fact. Scientific fact. All right. So, let’s go into some controversy. What’s on this operational side.

Emil Eifrem:
So, I think that on the operational side, I think that there’s four segments emerging here. I think the relational database is underlying it all, and we can talk later on how I think it’s going to unfold in terms of relative weighting of the different segments. But the database market today, it’s the biggest market in all of enterprise software. I It’s $50-billion market. And it depends a little bit on who you’re asking.

Emil Eifrem:
But if you talk to the gardeners and the IDCs and the Foresters of the world, it’s expected to grow to a hundred billion over the next four to five years, either 2024 or 2025. So, that’s $50 billion of incremental market cap. And that’s real spend by enterprises primarily all around the world. And everyone is agreeing that most of that, if not almost all of that, is coming from the new. That is not Oracle.

Emil Eifrem:
And for those of us who’s kind of studied the database market five, seven years ago, it was a mature market growing at five to 7% per year. And now, the growth has skyrocketed. And it’s all driven by these new segments. So, what are the new segments that at least I see? I think there’s four. One is what I call document plus, plus. But again air quotes. That’s just my name for it.

Emil Eifrem:
And this is the old kind of document category spearheaded by Mongo. But what’s very clear to us at least here at Neo4j which is when we’re out talking to customers for a single project, it’s a zero-sum game between Mongo and Couchbase obviously because they’re a document database. But actually also Redis Labs, with Redis, and DataStax with Cassandra.

Emil Eifrem:
So, they are competing for the same architectural slot in the same projects even though they have a little bit of a different… They have different data models. But still they solve the same problem. That’s actually the definition of a product category. So, I think they are the same product category led by Mongo. That’s document plus, plus.

Juan Sequeda:
So, the problem that they’re solving is to just be able to go set up an application or product very quickly, and one single, I guess, application silo another database. But if my goal is to go create an application as fast as possible and scalable and all that stuff, just use a document database.

Emil Eifrem:
That’s exactly right. So, that’s the first one, document plus, plus. The second one is graph. And there’s no ordering between them. That’s just how I’m thinking about it. The second one is graph. And, obviously, I made my career. I’m massively optimistic about graphs. And we’ll talk more about that later, and we can talk about the relative weighting between the categories. But that’s the second one.

Emil Eifrem:
The third one is time series which is clearly differentiated. Now, Mongo just launched time series functionality actually a couple of weeks ago to great fanfare. I’m not an expert at a time series databases. But I’m looking at that and everything. And it feels like when all the then database vendors added graph as a feature which MongoDB did in 3.4 with the graph lookup operator. DataStax acquired Aurelius and added a layer on top of Cassandra for graphs. And Cosmos tried to do that from Microsoft on top of their document database. None of that worked.

Emil Eifrem:
That all went sideways and didn’t end up working. And when I’m looking at what Mongo is doing with time series, it feels very much like that. I think that’s going to be won by InfluxDB or time scale or one of the purpose-built native time series databases. So, that’s the third category in my mind.

Emil Eifrem:
And then, the fourth one is NewSQL. So, this is basically, in my mind, kind of spanner Google, spanner inspired massively scalable cloud native relational databases. So, this is the CockroachDBs or the Yugabytes of the world. So, those four categories, document plus, plus, graph time series and NewSQL is going to be the majority of the growth in the database space over the next four or five years. And that’s what I’m personally excited about.

Tim Gasper:
So, you feel based on your statements around sort of the time series segment, as you’re looking at the space, you’re kind of seeing these four specializations as being there’s their own optimization that happens within that specialization that when you start to do things like, say, “Hey, I’ve got my MongoDB, and I’m going to throw my graph on top of it, or I’m going to throw my time series on top of it,” or you say I’m a time series database. And, oh by the way, you can store regular objects in me or something like that you’re now breaking what feels to be these sort of operational optimization areas.

Emil Eifrem:
Yes. I’m looking at it from two axes. One is value to the end user. And the other one is how hard is it to bolt on to an existing system? And I think time series, for example, is very valuable for the end user. It is really hard to bolt on to an existing kernel. For example, at Neo4j, if we would try to do time series really well, we would have to do so many awkward things with our kernel. And I’m talking very physical things. How do you lay out data on disk? How do you structure it in memory that in reality it would be one of those classic engineering tradeoffs where we wouldn’t be as good for graphs anymore?

Emil Eifrem:
Now, we can all fake it. These are all isomorphic models which is a fancy way of saying, “You can serialize data in one form to the other without losing data.” And so, you can also always fake it. I can fake time series and model it in the graph and stuff like that. But that’s not what I’m talking about. I’m talking about doing it with real performance and real skills. [crosstalk 00:28:22]

Tim Gasper:
Go ahead, Juan. Yeah.

Juan Sequeda:
No. If you just want to have a check box, everybody can do [crosstalk 00:28:29]. But no. You’re talking about who really own it. Yeah. Sorry, Tim. [crosstalk 00:28:32].

Tim Gasper:
And not just sort of having a company own these things. So, for example, let’s say MongoDB buys influx or something that like that. Just because it’s part of the portfolio doesn’t mean that it’s all part of the same database.

Emil Eifrem:
Exactly.

Tim Gasper:
I mean is this good for application developers? When you think about, “Hey, I’m going to build this application,” it’s a relatively complex application. And it has time series elements. It has document elements. One thing we haven’t talked about I’m curious which bucket you feel like it falls into is like search where the idea of Elasticsearch and things like that. Is this a good thing? Sometimes, you need to combine two, three, four things together to solve your problem. Maybe, that’s okay. That’s the new reality.

Emil Eifrem:
I think the goodness has to be evaluated with the business lens. What’s the value of the problem that we’re solving? If this allows us to solve a business problem that was previously untractable you couldn’t solve, and it’s really valuable to the business, then, I think then it’s a great, and you can do it because you can compose these different back-ends. Then, I think that’s a fantastic thing. But if you purely look at it entirely just from a technical perspective, it probably isn’t because there’s more moving parts. And that’s more complexity for the developer. But if it’s in pursuit of something that is very, very valuable, I think that’s what makes up for it.

Juan Sequeda:
So, I want to keep graphs separate right now because we’re going to dive more into that. So, the use cases are kind of the reasons why you would go for a document time series in NewSQL and, I guess in graph two. You talk about documents saying, “Well, if I have a single application I want to have, I want to develop fast and have a very scalable solution document.” But my guess if I’m using more, I got more, what, sensor data or stuff. I would use more time series.

Juan Sequeda:
So, when would you want to go use any of these four categories you’re talking about because, technically, let’s take time series out. You can come up with any application say, “Well, you can give reasons you should use document for that.” And you can also give good reasons why you should use a graph for that. So, how would you even select out on that?

Emil Eifrem:
I think it’s back to the shape of the data and the shape of the workload, the queries on that data. And if data, if it’s to your point, I don’t know, stock ticker data, once every second, we’re reporting the value of the stock, that feels very time series. [inaudible 00:31:12], you look at that. It’s like, “Well, what we want to know is kind of the average of the past minute, hour, day.” Wow, that’s a very kind of time serious workload.

Emil Eifrem:
Then, it fits really well with that. Hey, all my developers, they already serialized my data as JSON. Man, that seems to fit really well in a document database. And so, I think it’s back to it’s the shape of the data and the workload on it.

Juan Sequeda:
Agreed. So, when would you use SQL because we’re all just [inaudible 00:31:53] trend the normal transactions or… I mean because all these transactions you can store also in the document. You can store and all these things. So, I’m now curious to see is where do you see SQL [crosstalk 00:32:05].

Tim Gasper:
Is SQL just an old habit dying hard?

Emil Eifrem:
I think there’s some of that. One thing you said before, Juan, was that you believe or maybe you just kind of wanted to be provocative saying that SQL will be around forever and will be the dominant paradigm. And I actually don’t see it that way. I think that right now, it is. But if you look at the number of applications being written with SQL as a backend today versus 10 years ago, it is dramatically different.

Emil Eifrem:
And my prediction is that 10 years from now, it’s going to look a lot different too. And I think it’s because if you set aside kind of the analytical to set aside reporting and things like that, there’s very few domains who are inherently table oriented. And the way that I think about this is that when you’re kicking off at least back when I was technical which I’m no longer, but back when I used to write code for a living and as a consultant, you would build a new system for someone like in the enterprise. Day one, you get into a room tons of white boards. You have a bunch of domain experts, and you just say, “All right. Tell me about your world. Tell me about the domain. Tell me about the application.”

Emil Eifrem:
You started whiteboarding that. And they started talking about pensions or insurance or whatever the domain that they were experts at. Almost never did they end up intuitively drawing tables on those whiteboards. They might. If it’s for example, a payroll system, it’s first name, last name, I don’t know, role, salary, something like that. Okay. That’s an intuitively tabular data set.

Emil Eifrem:
But most of the time, it’s like we have a shopping cart. Inside of that shopping cart, we have order items. Those order items refer to a product. That product, it’s a book which sits in the product hierarchy. It might be science fiction book which is a fiction book which is a book. That very infrequently ended up being tables.

Emil Eifrem:
And so, if I think about kind of just all the applications being written out there, I think there’s a mismatch, a cognitive mismatch, between the building blocks exposed by the relational database. Now, it’ll still be around when we retire, screw it. When the three of us, when we die, the relational database will still be around in the storing information, that kind of stuff. It’s not going anywhere. Mainstream, yeah, whatever. Mainframes are still around.

Emil Eifrem:
And COBOL is probably the most popular programming language if you start looking or something ridiculous like that. And so, it’ll still be around. But if I look at new applications, I don’t think most of them will use a relational database at the back.

Juan Sequeda:
So, this is going to be this good side way to graph. But before we get there, a couple things. I saw a talk by Bob Muglia, the former CEO of Snowflake. And it’s all about relation always wins. And he’s actually talking about the relational knowledge graph. And there’s a good friends of the company-

Emil Eifrem:
Relational.ai.

Juan Sequeda:
Relational AI. They’re about the relational knowledge graph and stuff. So, I’m now thinking we need to have a panel where you put… You guys here. This would be a fun one. So, that’s one thing. So, I agree with you that relation will always exist. I mean it’s not going to go away. But I think there’s always going to be the type of workload, the type of… There’s types of data that will always be tabular.

Juan Sequeda:
So, for example, I agree that we’ll have, oh, an order has an order line. And order line has a product. The product goes into a shopping cart, all that stuff. But when you go into the details of, what is an order what is an order line, that’s just tabular data. I mean at the end, I just want all this stuff. It’s just the relationships between the main things end up being more, or you draw them on the whiteboard as a graph. That’s one thing.

Juan Sequeda:
So, I think when people say, “Graph,” why graph if I can do this in relational? You can do whatever you want in a totally complete language. I don’t care about that. At the end of the day, it’s not about one or the other. I think it’s going to be about a convergence of we got to understand how it’s not graph or relational. It’s graph and relational. And I think there are pieces of data that are naturally going to be tabular, but the relationships between them, again, relationships, connectedness. I think that’s how I see it. It needs to be a combination of those two things.

Juan Sequeda:
And the other part is that when you start doing the analytics, you start doing I want to do an average, that’s tables at the end of the day. It’s like I just want to go some everything in this column. So I think one is how I’m building applications and storing the data for those applications, and what’s the best way of managing that. And I definitely agree that the graphs is the way to go do this because that’s how you think about it in your head. That’s what you do on the whiteboard.

Juan Sequeda:
But when you’re actually using the data to do some sort of analytics, for some sort of analytics or kind of traditional old-school analytics, still reporting on tables. Now, we’re taking it to the next level, and you want to go do things that are in graphs and stuff which takes us to the topic about graphs which, up to now, we’ve been talking for half an hour and haven’t touched on graphs on purpose. Get into graphs now because, heck, that’s your life. That’s my life.

Tim Gasper:
The main course, huh?

Emil Eifrem:
Yes.

Juan Sequeda:
My main course. They’re in the main course now. Graphs, where should we start taking?

Tim Gasper:
Take us [crosstalk 00:37:57].

Emil Eifrem:
[crosstalk 00:37:57] so much amazingness to talk about, isn’t it?

Juan Sequeda:
Okay. So, let’s start why are you fascinated by graphs? Why did you decide to go focus your entire world life company, everything around graphs?

Emil Eifrem:
So, a couple of things, well, very precisely a couple of things, two things. One is comes back to maybe dovetailing off the conversation we just had. As a developer, I just found that to be the most intuitive model to express most domains. For those of you listening in to this podcast who is not doing it, commuting, whatever, just audio only, if you’re at a computer right now, go to images.google.com. Search for domain model which is the way to express an application when you’re really building a new application. Search for domain model.

Emil Eifrem:
What you’ll see there is a page full of example domain models. Keep scrolling. Every single one of them you’ll see is a graph, every single one of them. Go to the Wikipedia page for domain model. Right there, it has a US healthcare one. You know it’s US healthcare because it’s very complex. And it’s a big graph.

Emil Eifrem:
I think all applications are graphs. It’s objects connected to other objects. And they’re connected in various ways. And I just found that to be the most friction-free way to translate the domain model into something that a database could operate on.

Emil Eifrem:
So, that was kind of the first one. The second one was the observation, and I guess this is a podcast about data and beer.

Juan Sequeda:
And cocktails.

Emil Eifrem:
And cocktails. Sorry. Cheers. And the question is what is data. This is the second reason. What is data? Data says actually fairly abstract concept. You ask people, “What is data?” To me, data describes the world. This gap is the real world. And what is very, very clear in terms of one of the most secular trends in the universe at least on our planet is the world is becoming increasingly connected.

Emil Eifrem:
I say on a podcast recorded from three locations in the world, I sit here with two phones with two AirPods. My car is 150 plus computers embedded connected in various ways. There’s four sim cards connected to the internet. Everything is becoming more and more connected. And if you add those two things together, data describes the real world. The real world is becoming more connected.

Emil Eifrem:
There’s going to be more and more connected data in the world. And graph databases are the most amazing piece of technology for figuring out how things are connected. And if that’s true, then the drivers, the reason to use a graph database is just going to be more and more every day. And so, you add those two things together. That’s what ultimately made me so excited why we created Neo4j the product, and ultimately Neo4j the company.

Tim Gasper:
Before we get too much into the graph tech side, I mean obviously, you’ve been doing this for a while. And you just talked about sort of the inspiration that led to it. Let’s start with the business use cases. So, what would you say is the business value of graph? You’re trying to convince execs to invest in graph database technology. What use cases are you pointing to? What ROI are you pointing to?

Emil Eifrem:
Yeah. I think when you talk to execes, I think it’s two levels depending on where they sit in the organization. If it’s absolute top level like board level, CEO level, CIO level, it’s aligning to massive broad trends. It’s back to, and I apologize on a no BS podcast for using this term. But it’s back to digital transformation. You’re shifting all your business to become digital [crosstalk 00:42:29].

Tim Gasper:
We’ll bring that out later.

Emil Eifrem:
Yeah. Exactly. And you need a platform for that. And you know graphs built some of the most amazing billion, trillion dollar companies on the planet, Facebook, Google. That is all the underpinning of that was graph technology. We give you that for the normal enterprise, not for the web companies.

Emil Eifrem:
So, that’s kind of the highest level history like boardroom level. The more common one is for, let’s call it, mid-level line of business execs. And then it’s about taking a very specific business problem. It’s not talking about the technical stuff at all. None of what we’ve talked about so far in the podcast. It’s saying, “Dear director of risk at a big bank. You know what? Your fraud detection software today can capture a lot of bad guys. But you know what they can’t capture? It’s if you have a number of transaction, none of which are anomalous. But they’re connected in a way that is anomalous,” which is, for example, a fraud ring.

Emil Eifrem:
The only way you can do that is being able to operate on connected data. And the only way you can do that is by using a graph database. So, let’s have a conversation. Let’s spin up a project. Give me, and I’ll make up some numbers, a million for my database, five million for the project, six to nine months down the line. We’re going to augment your existing fraud detection solution with a way to capture fraud rings which will increase your ability to detect fraud by three to 5%. So, that’s a very business level-type thing.

Emil Eifrem:
And then, it’s the same for retail recommend like real-time recommendations, for example, in retail or identity and access management or you just go down the list of the kind of the classic business problems that graphs solve today of which there are N amount today and N-plus one tomorrow and N-plus two the day after tomorrow. This is more and more because the world is becoming more connected and more and more business problems require or at least benefit from being able to operate on connected data.

Tim Gasper:
I love that way of approaching it. And I know Juan and I in our notes here are like, “Yes,” because we agree with a lot of how you’re sort of pitching that. And what you just said there, graph database could be replaced with X. And X could be whatever is providing that value. And I love that you know you’re resonating. You’re speaking the language to business people, if you can replace whatever it is with some other word or phrase and really it’s about the business value.

Emil Eifrem:
That’s exactly right.

Juan Sequeda:
So, we’re seeing here on the chat kind of some of the technical questions that I knew we were going to go get anyways, the whole convergence of different graph, the stuff. And I think I’m going to bet that we’re here on the same page on this one which is it’s not about the RDF versus the property graph. At the end of the day, these things will also converge. Well, that’s my position. I’m curious what your position is right now.

Emil Eifrem:
Yeah. So, this is probably an area. I mentioned when we talked before, you’d school me on this topic. But I think we probably do disagree here. I think RDF, yeah, I think so. I think RDF was a massive source of inspiration for me. And before we built Neo4j, I spent a lot of time in RDF land because, finally, someone saw the world as a graph.

Emil Eifrem:
And it just clicked with me on such a deep level. The concept was fantastic. The implementation was horrible. It was so clear that every single software that I bumped into, every API that I bumped into was written by someone whose primary deliverable was an academic paper. And as a side effect, they had to write some software which meant that the APIs are… whatever, and stuff like that.

Emil Eifrem:
And as a developer, I just felt tortured. There’s so much friction between me and that beautiful graphic data model. You bet you had to go through all this pile of crap to get there. And-

Juan Sequeda:
Yeah. That was 15 years ago. That was 15 years ago, for sure.

Emil Eifrem:
Yeah. So, the question then is how much has it improved. And I think it probably has improved significantly. But still, I feel like the massive benefit of the property graph model and why graph databases took off, I will immodestly like immodestly to say that it is thanks to property graphs. Not thanks to [crosstalk 00:47:28].

Juan Sequeda:
I know. I publicly say this is thanks a lot not just to the Neo4j and the property graph. Also, to your gigantic marketing machine, and in your grassroots approach to this. I mean that was beautiful. That’s what [crosstalk 00:47:46].

Emil Eifrem:
Neo4j [crosstalk 00:47:48] and others. Neo4J and other vendors in the property graph space. But that real focus on the developer and developer productivity where I would rate us C or something like that and something like Mongo or MySQL is A-minus or B-plus or something like that. I think there’s a lot that we can improve here by the way.

Emil Eifrem:
But I think RDF was never even a C. And I think that held people back from what I think is this very intuitive and powerful model. So, will they converge then? Maybe. I don’t know. There’s parts of me that is nervous about RDF’s prominence because I just have this scar tissue of developers wanting to get to graph data encountering RDF, having such an awful experience that they end up getting turned off.

Juan Sequeda:
So, I would argue that I think the position that you’re talking about was definitely spot on for things that were 15, 10 years ago. I think it’s changed a lot in the last five years. And we don’t have time to go into. I think that’s another type of podcast episode we can go have. But I am very happy with the work that, for example, Jesus Barraza, from your team is doing on the note.

Juan Sequeda:
I mean that’s just showing you how that stuff is converging. And then, you have things like Neptune who are doing both SPARQL and Gremlin. And they have a Cipher, openCypher supports.

Emil Eifrem:
I love Cypher.

Juan Sequeda:
I think that’s how I’m seeing more and more of these tools. I mean you’re seeing other databases. I mean Anzo and Stardog, they’re also supporting some sort of the property graph. Why could they do that better? Yes, there’s biases, of course. But I’m just I’m seeing these kind of convergences going on and also in the sole standards and the whole GQL standards in the property graph schema working group which I’m the chair of right now.

Juan Sequeda:
We see how you bring in all those things that the RDF world has done in. And, yeah. We’re converging to it. Actually, there’s a knowledge graph book coming out or from O’Reilly, from Amy, and Jesus. And I actually reviewed it and wrote the forward for it. And I loved it. And it’s just showing more of the convergence about it. So, I don’t know if you’ve read my forward yet. But hopefully, you will soon.

Emil Eifrem:
I’m not ready for forward yet. But there’s another enterprise knowledge graph book coming out too, not to be forgotten.

Juan Sequeda:
There’s one that came out-

Emil Eifrem:
[crosstalk 00:50:23] Yeah. Exactly.

Juan Sequeda:
All right. I think we’re getting kind of on our producers telling us that we got to look at the time right now. And we got some kind of wrap up stuff we want to go do. Emil, we could just sit here and talk for hours. And I just can’t wait for us to go. We should probably do this again. We have to do a graph update or the next version of this-

Tim Gasper:
Redux.

Juan Sequeda:
Yes, and focus exclusively on graphs. But let’s go into our little lightning round which you have no idea we’re going to go ask you. So-

Emil Eifrem:
I have no idea.

Juan Sequeda:
All right.

Emil Eifrem:
So, I’m drinking great beer. So, I’m not afraid. I will charge your head.

Juan Sequeda:
All right. So, first question, so, many people know SQL in the world. Will there be a day that same amount of people actually know a graph query language?

Emil Eifrem:
Absolutely. There’s no doubt in my mind.

Juan Sequeda:
All right.

Tim Gasper:
Nice. Direct. All right. Second question-

Emil Eifrem:
I thought it was yes no question. I happen to expand on it. But [crosstalk 00:51:27].

Tim Gasper:
You can add a little color. Yeah. Feel free to add a little color.

Emil Eifrem:
No. It’s back to the trends we talked about before. Everything is becoming more connected. So, the performance and scalability driver of graph databases which was the primary driver graph database adoption in the previous decade and will remain a really important driver in this decade. But it’s not going to be the primary one. But that driver, just that pain will increase as the world is becoming more connected. Data sets are becoming more connected. Your competition is going to use connect the data queries which means you’re going to have to do it because, otherwise, they’re going to be the Google, and you’re going to be the AltaVista if you can’t operate and connect the data.

Emil Eifrem:
So, that along with the other driver of this just being a domain, a better fit for most domain models means that I actually think there’s an opportunity. It’s not guaranteed. It won’t happen naturally. But if we, at Neo4j and other graph vendors, play their cards right, I think there’s an opportunity for graph databases to be the first database that developers reach for when they’re building a new application.

Juan Sequeda:
I love this. You want to be the Google or AltaVista. Go, Tim [crosstalk 00:52:38].

Tim Gasper:
Well, I think about the 80s and how so that’s so popular, the retro thing. Maybe AltaVista is kind of cool now. It’s retro. All right. I got a question for you. And then I think we may need to move forward just because of time constraints. So, will the majority of new business applications 10 years from now, so majority, over 50% be built on graph databases at the core?

Emil Eifrem:
Yes. And that’s what I just had

Juan Sequeda:
All right. Let’s do quick on this one. This is a funny one. Did you really come up with the property graph bottle on a plane to Mumbai?

Emil Eifrem:
Yeah. So, this is where kind of marketing simplifies things. So, I first drew the model literally on a cocktail napkin with notes and relationships and key value person on both. But then, we had an intense week with that team where we really refreshed it, and all that kind of stuff. But that usually gets lost in kind of the simplified marketing version and then in parallel folks like Marco Rodriguez had built this out without having seen our staff. He built it out for his stuff which ultimately led to TinkerPop, a couple of iterations down the line.

Emil Eifrem:
But yes, that first on that flight to Mumbai, I was like, “All right. I need to stay ahead. These guys are really smart. I need to think a little bit. What are we trying to build there? And I drew it on a cocktail napkin what people today call the property graph model.

Juan Sequeda:
All right. Well, hey, takeaway time, Tim. TTT, Tim takes us away with takeaways.

Tim Gasper:
Yeah. I mean there’s so much good stuff. I mean we say that every single week. We’re very blessed with our speakers. I mean one of my big takeaways is around your assessment of the landscape thinking about sort of the divergence of the transactional or operational database space, but how these specializations are really useful. And then, on the analytics side how you’ve got sort of the data lake and the data warehouse. But these things are converging.

Tim Gasper:
But ultimately, all of these things are tools in the tool chest. You have to make the right choices for to optimize for your particular business use case. So, I think that’s very helpful. And what about you, Juan?

Juan Sequeda:
You’re very concrete, the four things you’re seeing, document plus, plus, Mongo, CouchDB, graphs time series NewSQL. That’s where the world is going. How do you know which one to go select? Understand the shape of your data and your query workload. Back to you. I’m a very quick and short advice, two things, two questions. One, what’s your advice about anything, life, data. Second, who should we invite next?

Emil Eifrem:
To advice generically, like any advice?

Juan Sequeda:
Any advice.

Tim Gasper:
Anything you want.

Emil Eifrem:
Oh, my god. Yeah. That’s very broad. Well, okay. So, let me maybe segment it in three different ways then. Advice to, I bet we have other kind of fellow startup founders, entrepreneurs, that type of an audience. And one of the things is, first of all, ignore advice. I’ve gotten a lot of advice. I’ve listened to some of it. I’ve ignored most of it. So, that’s going to be the first one.

Emil Eifrem:
The one thing though that always is true is there’s no way for you dear fellow startup founder to over index on understanding the customer. That’s the one thing. And when I say customer, I mean the actual user of your product, not the economic buyer. Yes, they’re valuable. Not their boss, yes, they’re valuable. Not freaking procurement, yes, they’re valuable. But understand the actual user of your product. There’s no way you can spend too much time doing that. That’s the one advice for startup founders.

Emil Eifrem:
And then I think general work advice, surround yourself with the… I mean you’re the average of the five people you spend the most time with. So, find the brightest people no matter what the role is, no matter what the pay is. Just find the brightest, smartest, most well intended people that you can.

Emil Eifrem:
I think general life advice, I sometimes get asked this these days, as if I had any clue what I’m doing. But I think the one thing I will say though is I feel people are really under indexing on choosing their life partner. I feel like I find really thoughtful people who are like they take such considerable blah, blah, blah. And then, they marry the person in Sweden. They happen to be dating when they’re in their early 30s. And probably in the US, it’s earlier, mid-20s.

Emil Eifrem:
What are you doing? This is probably the number one decision that will impact your life the most, is your life partner. Man, don’t take that decision lightly. My improvised three slices of advice since you gave me a very broad [crosstalk 00:57:35].

Tim Gasper:
Be choosy, everyone. [crosstalk 00:57:37]. Just because you hit 33 doesn’t mean you have to marry.

Emil Eifrem:
Exactly.

Juan Sequeda:
All right. 30 seconds, who should we invite next?

Emil Eifrem:
You should invite Jeff Jonas from Senzing. If you haven’t had him on, you know that I listened to many episodes of this podcast even before you invited me. I haven’t listened to all of them. So, I don’t think you’ve had him on. But I don’t know. But Jeff Jonas founded a company called Senzing. He’s one of the main thought leaders, if not the main thought leader in ER, Entity Resolution.

Emil Eifrem:
I think he’s the only IBM fellow to ever leave IBM. I think that’s true. And it’s just one of the smartest people ever in data. Get Jeff on the show.

Juan Sequeda:
Awesome. Well, a pleasure. Thank you so much. This was an amazing conversation. I think we have to go do this again. Just focus 100% on graphs on that next time. Cheers. Thank you. Enjoy Sweden. Enjoy your beer. And next week, we’re going to have Denise Gosnell who’s the CDO of DataStax. I talked to her today, and she sends a lot of best regards and going to continue the conversation about graphs and data and open [crosstalk 00:58:55].

Emil Eifrem:
Another graph enthusiast.

Juan Sequeda:
There we go.

Emil Eifrem:
Fantastic.

Tim Gasper:
Databases for the win.

Juan Sequeda:
Cheers.

Tim Gasper:
Cheers.

Enter Content Here.