About this episode

With the hype of graph databases and knowledge graphs, a common (mis)practice is to quickly migrate your existing siloed data into a graph database. But be careful! You may just be bringing the complexity of your silos into the graph.

Join Tim, Juan and guest Jans Aasman from Franz Inc, the makers of AllegroGraph, for a conversation on why your graph-based machine learning and 360 projects should start with data modeling.

Special Guests:

Jans Aasman

Jans Aasman

CEO, Franz Inc.

This episode features
  • Data modeling approaches you should consider
  • Tips to avoid data modeling pitfalls 
  • If you could be a top model for any product/brand, what would it be and why?
Key takeaways
  • It’s “terrible” to start creating an ontology without knowing the application
  • Intelligent people make the schemas… this is not easy
  • Modeling is human problem solving!

Transcript

Tim Gasper:
It’s Wednesday once again, and it’s time for Catalog and Cocktails. My name’s Tim Gasper. I’m a longtime data nerd and product guy at data.world. Joined by Juan.

Juan Sequeda:
Hey Tim, I’m Juan Sequeda, principal scientist here at data.world. And it’s Wednesday, middle of the week, 4:00 PM Central. We’re live and always ready to take a break and chat about data. And this week we have a special guest and a special topic. A topic that I bring up a lot and like, “Why haven’t we talked about this?” And I’m super excited to be able to chat about data modeling and chat with Jans Aasman, who’s the CEO of Franz and they’re the makers of AllegroGraph. And those who know Franz, they’re is a large history around what Franz has been doing in the AI space for many, many decades. And he’s a pioneer in AI and semantic technologies and graph databases. Jans, how are you?

Jans Aasman:
I’m fine. Thank you. [crosstalk 00:00:56].

Juan Sequeda:
We’re excited to have you here. So let’s kick it off with our Tell and Toast. So what are you drinking and what are we toasting for?

Jans Aasman:
Well, I’m drinking maple syrup, old fashioned with, well, the regular bourbon and maple syrup and some bitters. And that’s my almost daily drink. So happy to do it here. Although a little bit early here in the San Francisco Bay Area, but okay.

Juan Sequeda:
[crosstalk 00:01:27].

Tim Gasper:
I like the little maple on the old fashion. That was actually my drink. I think two weeks ago I did a maple riled fashion.

Jans Aasman:
Oh, well, the best one I had in Brooklyn, where they also had bacon infused maple syrup and then with smoke through it. That was the best cocktail I ever had in my life, but too much work to have. I’m looking for bacon infused maple syrup, but if you guys have a point, I would love to.

Juan Sequeda:
All right, [crosstalk 00:01:56].

Tim Gasper:
[crosstalk 00:01:56] interesting. I’ll look on Amazon. That’s where-

Jans Aasman:
I tried it. It’s hard to find.

Tim Gasper:
Right.

Juan Sequeda:
How about you Tim, what are you drinking today?

Tim Gasper:
I am drinking a Sazerac. I have some absinthe, and you know what? I don’t have any cocktails that ever use absinthe for. And so every once in a while, I’m like, “I should have another Sazerac,” because you do the absinthe rents and this one I’m actually doing on the rocks. So that’s my cocktail for today.

Juan Sequeda:
Well, I had a bottle of white wine open. It was… I forget. I think it was just a Sauvignon Blanc and I had a little bit left of gin and I had cucumbers. And I look up and I just made a white wine sangria/gimlet with cucumbers. And there’s lime in here. So cheers. What are we toasting for today?

Tim Gasper:
Cheers.

Jans Aasman:
Me? Oh, well, you always toast the things that are nearest by to you and close in your memories. So my son just got married, that’s one. I just finished your book, so maybe you should toast your book.

Tim Gasper:
Oh, that’s awesome.

Jans Aasman:
But actually, I was thinking to toast the artificial intelligence at Wells Fargo because this morning I got a phone call actually just before this meeting here that they had found a whole string of fraudulent transactions and they were asking me, “Did you go to an In-N-Out Burger?” Which we don’t have where I work, and, “Did you buy clothes at Nordstrom?” No, I didn’t. So I had to cut up my… But I’m always so interested to see how that AI works, because they hardly ever get it wrong. And I do sometimes buy something at an In-N-Out Burger. So I mean, not that weird, but I’d love to see the algorithms behind that.

Tim Gasper:
Yeah. It’s pretty cool how often that works.

Jans Aasman:
So let me toast to the AI at Wells Fargo.

Tim Gasper:
Cheers to all of that.

Juan Sequeda:
Cheers to all that and really cheers, thanks for buying the book and read about it and reading. I’d love to go get your comments about it. So we have our warmup question of the day, if you could be the top model for any product brand in the world, what would it be and why?

Jans Aasman:
Well, it would be for Trek electric bikes.

Tim Gasper:
Hmm.

Juan Sequeda:
And why?

Jans Aasman:
Well, I’m Dutch and I had my first car when I was 28. I did everything on my bike, driving to work, to college, everything. And then finally I got a job too far away. I got in a car, gained way too many kilos. And then recently we from Oakland to a little town in California called Lafayette. And it’s about six miles from my home. I mean, we are on the top of a hill, so I didn’t want to do it on a regular bike, but these electric bikes are fantastic. You still have to work hard, but at least you get your exercise and you can be outside in nature on your bike. So love my bike. I’m only three weeks. I mean, we’re just back in the office, by the way definitely [crosstalk 00:05:08]-

Tim Gasper:
I’ve never been on an electric bike before. I’ve been on an electric scooter, but never an electric bike. It seems cool.

Juan Sequeda:
This was a good propaganda for it. I need to go try one out

Jans Aasman:
Yeah, yeah. [crosstalk 00:05:21].

Tim Gasper:
[crosstalk 00:05:21] good modeling.

Juan Sequeda:
How about you, Tim? What would you be the model for?

Tim Gasper:
So I was thinking two potential answers here. One of them is Tesla, not because I would make Tesla look cool, but because it would make me look pretty cool. And could you imagine, you don’t even have to drive the car. You can just be like, because the cars is going to go by itself. The other answer I was thinking about had more to do with my day job. I was like, “I could be the model for Atlassian, for Jira.” And I could be like, “Are you a product manager? Do you need to fill out lots of issue tickets? Well, I’ve got the project management software for you.”

Juan Sequeda:
Hey, there we go. And if you’re listening, you just got a free ad right there. That was awesome. My quick answer is, and I carry this all the time is a Yeti. I love the Yetis. They’re just cool. I mean, I love them. Yeah.

Tim Gasper:
It’s robust.

Juan Sequeda:
Yeah always they’re robust. They always keep your water really cool. And they look cool too. And I got my little sticker here. This is me, my sparkle tar me as a scientist. So, all right. Well, enough of this chit chat, let’s get into some real business here. And the honest there’ll be as question your answer to kick off is why don’t we talk about data modeling? Why isn’t data modeling a thing? We talk about all this other stuff about data, but we never talk about data modeling. Why is that?

Jans Aasman:
Well, obviously when we do projects with customers, we always start with data modeling, but let me give you another answer. I’m a psychologist. I accidentally got into technology and became a CEO, but I’m still at heart a psychologist. Read psychology literature all the time and deeply interested in cognitive science. You have to imagine that when people make schemas for relational databases… Oh, where are you guys going? Then they… It’s very intelligent people that put these schemas together, but they don’t really care about if other people can read the schema. So they use abbreviations. Actually, it’s wonderfully well described in your book, by the way, Juan. But all the ways people mess up the complexity of schemes. And so now people want a very easy method to untangle the craziness of the schema that people had in relational databases.

Jans Aasman:
But the problem is there was so much human intelligence that is most of the time undocumented, went into making that schema. And now people hope that there’s an easy tool to untangle that again, but you need the same amount of human intelligence to do that, maybe even more. It’s like reverse engineering sometimes, especially if you don’t know that new enterprise data warehouse that you suddenly have to get data out of. So data modeling is very… That’s one of the things, it’s too complicated and you can’t make it systematic. Then I taught data modeling. I mean, I did Jacobson [inaudible 00:08:34] oriented software engineering when I was teaching at the university in Delft. I did that. And that’s actually the most important modeling technology that I ever learned. Starting with the models, the stakeholders, what is the use case? It’s the interaction model with the analytical model.

Jans Aasman:
And even now when I help people that want to do modeling and take the data that they have in their silos and put it into a knowledge graph, I said, “Okay, the first thing I want you to do is take a really deep breath and forget all about protege and top end composer and ontologies and Al and all of that.” If you are a software engineer and you did object oriented software engineering, you’ve got everything you ever need. And then the other thing I don’t do is ever start the top down, very complex logical models. Basically if I’m doing a course or help a customer get into the modeling for their knowledge graph, for example, then I just ask them in general, “Okay. What are the top 10 questions you want to answer with your new knowledge graph over the system you’re building?”

Jans Aasman:
I really find it terrible to see people that start building an ontology without knowing what the application is going to be, or the data model. The data model is always a function of the questions you want to answer. Although I will get back to that later, if you don’t forget to ask me about that. So I start by that and then I force people to just write instance data on a whiteboard, just how you intuitively think. So I take people and I let them do these instances. And they find it usually very easy and a fun exercise to do.

Jans Aasman:
It’s just a fun to say, okay, how would my model look like? And then I can help them a little bit to… But then just you get an initial model and then it’s actually in most cases, really easy to reverse engineer that into an ontology. And of course you can use that same simple instance model also to see if the top 10 questions… I mean, they never get more than three, but if the top 10 questions actually can be answered with that model. And do I actually have the data in my silos that can do that. But I start always bottom up. What do you want to ask? How do you think the data could look like as instances and then go all the way up to the formal model. And I think that’s actually way more logical than is sometimes taught in universities where you build this grandiose modeling thing, starting with a thing and then slowly going down to a particular transaction object.

Tim Gasper:
Can you go into a little bit more detail what do you mean when you say bottom up?

Jans Aasman:
Bottom up-

Juan Sequeda:
And just to interrupt here and go ahead. We’re now talking about bottom up and top down. So I want your definitions of bottom up modeling, top down modeling and why the bottom up is the way you go and why apparently, you do not recommend top down.

Jans Aasman:
Okay. Well, it’s also something that comes up in… So I’m a lisp person. I mean, we have a lisp company here and we sell lisp compilers. And there are some people that start with defined function and then they do the top three steps and then they go to the first step and they make that in three steps top down, really trying to make the tree. Whereas what is way more natural is if you’re in a particular domain, you do tiny sub-functions that you think you’re going to need. You try them out in your language. You can see if the lowest level functions, XT will work. And then what lisp do is built like a domain language. You make a language that’s very specific to the domain you’re trying to solve. And then you can go back to the top level to express what you want to solve in that sub level, that domain language that you built.

Jans Aasman:
So it’s like an interplay between bottom up and top down. So I’m not arguing against top down or bottom up, I’m just saying it’s always this going back and forth between, okay, what happens at the lowest level and how do I look at it from the top down level? And how does that… But again, it’s a human problem solving process, doing a totally top down is only something that probably Java programmers can do.

Juan Sequeda:
So one of the things that you-

Jans Aasman:
I hope this… Yeah, if you want to go this discussion, I’d love to talk about this.

Juan Sequeda:
No, but this is interesting too. You’re saying that if you’re a programmer like Java programmer, you’d probably be thinking a top down, but if you’re thinking about just data and integrating data, then looks like you’ll be thinking about from bottom up. This is an interesting perspective. I never thought about it that way.

Tim Gasper:
Well, yeah. And the interplay between how a software developer thinks about architecting this versus a data engineer or data architect who… Top down being like, “Oh, we’re doing Kimble Ross, let me…” Versus more of a bottoms up approach being sometimes something that’s in structured data, but more often you see in the case of semi-structured data, where lots of documents are coming in and you’re like, “Oh, let’s try this schema. Oh, that didn’t quite work. Let’s try this instead.” It’s interesting.

Juan Sequeda:
And one of the things that you’re bringing up here is the common thread around all of this is people. And it seems to me that we don’t talk about data modeling that much because it involves humans and we want to automate everything. We want everything fast and putting humans is going to make things slower and we won’t agree. And so do we just need a shift… Well, I do think we need to go shift. It’s like we need to stop thinking that it’s just pure technology and we can go automate things as fast and do things as fast as possible that we need to be able to bring in humans and understand this balance. And data modeling is key to that. Data modeling is representing what we have in our brains. It’s like that in material stuff. And I want to be able to somehow take that in material and make it material somewhat tangible. I think that’s the process. And that’s not easy. I guess that’s why we don’t do it, because it’s not easy. What do you think?

Jans Aasman:
Well, that’s what I started with. Modeling is human problem solving, with everything that comes at human problem solving. Part of it is symbolics, part of it is based on experience, part of it could be explained by neural networks. But it’s a very complex human activity. And I have not seen any technology that could help. I mean, so all these beautiful UI based systems where automatically can do the mappings. They all work for the first 70%. I don’t even want to go to 80. And you described it in your book too, Juan. And then always programming is involved in combinations of objects. And then if this is in the object, then we want to go there. And if this is in the object, we go there. So suddenly, this beautiful tool that you built… Then suddenly, you have to add programming and you have to add JavaScript of Java or whatever else to your ATL tool. And suddenly it’s a very complicated thing and then you’re back to programming anyway.

Jans Aasman:
So I’m radically in favor of just use programming for data modeling. Actually now I’m making a distinction between data modeling and ETL, but they’re closely related of course.

Juan Sequeda:
Well, so this is an interesting aspect because when we think about ETL, we think about… I mean, now we’re seeing a lot of very popular the conversations around T, the transformations. This is where your DBT is and stuff. But at the end, I am moving something from a source to a target, but that target needs to be modeled, but somehow the modeling gets embedded in that transformation and the people who are writing those transformations, are they to talking then to other people? I mean, this is the open question. And it scares me at the end of the day, because you just have a bunch of models that are, again, it’s hard. You want to go talk to the humans, but nobody’s talking to the humans, they’re just…

Juan Sequeda:
And some of the engineers are writing these transformations. So technically there is modeling going on, but it’s this one person or these people that are most probably, I’m going to bet, not talking to the end users. I’m hopefully wrong, but I mean, it’s very common. You see this stuff, people just do transformations and do these mappings and they’re not really talking to the end users.

Jans Aasman:
Well, so we at our company promote very hard this model of entity event modeling, where most enterprise applications can be modeled as one very important entity. And then almost everything that happens to or with that entity is like a transaction, something with the temporal aspect to it. And that works for banks. We’ve done big use case in healthcare. We’re doing it at the call centers in other places. And it’s always the same. So let’s take healthcare. And you have an enterprise data warehouse, you have data streams coming in, you have HL7 fire streams coming in. And how do you model that [inaudible 00:18:44] something that is very simple?

Jans Aasman:
So we just say radically simplify your model. Just have one entity, say in healthcare, it would be the person. And then everything else becomes an event. So a diagnosis becomes an event, a test, a procedure, whatever else you can think of. And so in the healthcare application that we help with, we have 350 types of events. One event could be a diagnosis with again, 20,000 values for an ICD 9 or ICD 10 code, but still just 350 core events. And so now in the hospital, they have a UI system where on the left hand side, you do the inspection of your relational database, enterprise data warehouse, maybe other things. On the right inside, you just have dropdowns with the 350 core events you can have. And you can make a very easy mapping between the table and column in the relational database in the other technology and then into the knowledge graph.

Jans Aasman:
But the issue is the person that knows the knowledge graph can’t do it on his own and the person that knows this relational databases or epic or any of the other system can’t do it on his own either. So basically what they found is they need a team of someone that knows these relational databases, someone that knows the data model of the knowledge graph and have them work together. And they claim they can do about one table per 20 minutes in the case of healthcare. Does that make sense? Does it even answer your question? But the point is, we dramatically simplified that modeling by just saying well, everything is an event. It doesn’t work always, I noted. I can give you a lot of examples where it doesn’t work, but most enterprise applications, it does work.

Jans Aasman:
And then the modeling, you take away half the complexity because the target… You started talking about your target ontology, where you didn’t use the word ontology. The target, and then it’s just picking from all your columns and what have you into that very simple model. And then you can use R2RML or anything else you want. Does it make sense? You guys look a little bit confused now.

Tim Gasper:
No, I think we’re thinking about what you’re saying here and it makes a lot of sense. And I think we’re digesting it a little bit. I don’t know, Juan, do you have any follow up questions to that or?

Juan Sequeda:
Yeah. So it’s interesting, you’re saying an approach for modeling because I’m always thinking about how do we get people to go model? Because a concern I always have is people end up boiling the ocean and they-

Jans Aasman:
What do you call modeling? I mean, how do you define modeling?

Juan Sequeda:
Okay, let’s talk about semantics here. For me, the modeling is being able to go define a schema, and that schema is something that is go going to represent how an end consumer, a person is actually going to be able to understand the data. And I think traditionally, what we’ve seen is that you create the models for the application and you think about it as the requirements for the application. And you have the conceptual model, which does represent what the end users think about the world. But at the moment that model turns into that physical schema, it’s created to go support the application. And then you have this concept which physically gets horizontally partitioned, vertically partitioned because of query workloads, whatever.

Juan Sequeda:
And then that conceptual model gets completely disconnected from the physical model. And that gets thrown away or just a PDF that’s five years old, whatever. And then when somebody needs to go look at that database scheme to understand what it means, there’s a model in there, but what the heck does this mean? That’s the application centric view of the data. I’m like, “I want to be able to go create…” Ideally, all of this would’ve been represented in a model that the end users could always access. So I think there’s always different terms of models and in an ideal… I think that’s the cool thing about having graphs is that if you think about it, the way you would model things in a graph is actually the way you would go query that. And if you’re doing it in a relational world, not… I mean, it could at stage one, but at some point, you got so many things that you need to go optimize or that’s [crosstalk 00:23:11]-

Tim Gasper:
Yeah. There’s usually other concerns.

Jans Aasman:
Yeah. Well, I’m a big believer in Dave McComb’s data-centric approach. And the thing-

Juan Sequeda:
I think all of us are here. And I always have Dave McComb’s book right in front of me. It’s Data-Centric Revolution.

Jans Aasman:
It’s probably behind me somewhere.

Tim Gasper:
Yeah, I usually have mine too. I have some boring governance books over here instead though.

Jans Aasman:
All right. Well, that’s the other thing that for the main knowledge graph we’re building, the dream is always, can we have only one data representation for any kind of analytics that you want to do. That is the ultimate dream, not built for every query, a new data [inaudible 00:23:57] or a new weird extension to your model, but can I make one model for most of the queries that I need to do? Of course, it doesn’t always work but most of the time, we can make a work with the modeling that we do.

Tim Gasper:
Do you see the entity event model as being tied to that? When you think about data centric knowledge and entity event models, are these things connected in terms of your data centric foundation might be based on this type of an approach?

Jans Aasman:
Well, the answer is yes, that’s easy. It’s specifically built to support many different types of use cases, although when you look at our approach to entity event model usually covers 90% of the data. And then there’s 10% of the data that is just impossible to chart onto. In healthcare, it’s the 180 taxonomies and ontologies that we use. I mean, there’s no way you can chart that. So we have a model to deal with that in this approach. And for almost every application, I can come up with the 10% that cannot be charted. So you need to have a mix of entity event approach together with something that’s more of we traditionally call knowledge, if that makes sense.

Juan Sequeda:
This is an interesting… So are you saying 90% of the things in the world that you think about, 90% of that you can represent it through this entity event model?

Jans Aasman:
Yeah.

Juan Sequeda:
So can you [crosstalk 00:25:35]-

Jans Aasman:
Everything that has time in it. Everything that has time in it. Well, that has a temporal-

Juan Sequeda:
That a good question, does everything have time? Because maybe…

Jans Aasman:
Well, I mean, again, if I look at the taxonomy chain in healthcare, there’s no time, although sometimes they should because something comes before something else. But any other interesting application, whether it’s in healthcare, in telecom, in the bank industry, in call centers, I mean, everything that’s interesting is something that’s temporal. What happened? Everything is about what happened. And I think in every application that we do, that I see is, “Okay, can I predict the behavior of this entity?” I mean, that’s all we want, to predict the behavior. Is he going to buy something? Is he going to die? What is he going to say?

Juan Sequeda:
[crosstalk 00:26:25].

Jans Aasman:
It’s always about-

Tim Gasper:
A lot of times entities have time aspects as well, like a customer. But a customer when? They were a customer starting from this date and then they stopped being a customer at that date.

Juan Sequeda:
Yeah. I remember working on this. You can ask these philosophical questions. Is the one of today the same one of yesterday?

Jans Aasman:
Not in the hospital, because what we see is that people change their name, they change the social security name, they change their gender, definitely change their weight, even change their length. I mean, it’s unbelievable. Even what you think are properties change over time.

Juan Sequeda:
Okay. So-

Jans Aasman:
And of course because that gets too expensive for querying, so we have the trace of measurements or when we establish that this was your name, but in theory, we can always go back in history to every previous gender you had. I mean, or whatever.

Juan Sequeda:
Yeah. I love what Dean is saying here is so true. You can never step in the same river twice.

Jans Aasman:
Yeah. [inaudible 00:27:33]. Yeah, it’s very confusing to look at these questions at the same time that I’m looking at you and trying to listen.

Tim Gasper:
A lot of multitasking, right?

Jans Aasman:
Yeah.

Juan Sequeda:
We’ve been talking a lot about this entity event model, but we haven’t really described it. Can you give us your definition or explain how that works?

Jans Aasman:
And that with words, right?

Juan Sequeda:
With words.

Tim Gasper:
If you need to gesture or something like that, that’s fine.

Jans Aasman:
Okay. To begin with, when I think about… I’ve completely given up about thinking triples as just being triples. I think in the old fashioned AI frame based systems, the early version of object oriented systems where an object is just a set of triples with the same subject. And we actually call them a set of triples, but I really, really think in terms of objects to begin with. So now what are the events that I talk about? So let’s take healthcare. So I might have an impatient encounter. So I’m going to the hospital and I check in and then say four hours later, or 40 days later, I check out. That was one event. Now that event is an object with a start time, an end time and a type and then a few other key value pairs that might describe the event.

Jans Aasman:
But the event also has sub events. I went to this specialist. And then this specialist did this particular diagnosis. So the diagnosis is an event, where again, usually you don’t have an end time. It’s just, this is the time of this particular diagnosis. And then you get something prescribed or you got a particular procedure, but again, the symptom, the procedure or the medication order are just, again, objects with a time, not always an end time and a type and some key value pairs. But the shape of the objects is always the same. It’s an object with a type, start time, end time and a few other things that make that event a little bit more different, but the shape is always the same, that simple object. And you look at it also as a temporal object.

Jans Aasman:
In the hospital, in the Bronx, they use multiple enterprise data warehouses because they bought up eight hospitals over time. So you can imagine the chaos that that gives. But in just one enterprise data warehouse, they have 250 ways to describe time. They have the inpatient encounter begin time and the inpatient encounter end time. And so it’s fairly systematic, but still a human being has to remember 250 ways that you think about time. If you have the entity event model, you can be guaranteed that there’s a start time and there’s an end time, or not an end time, but there’s always a begin time. That makes it really simple. There’s always a type and then based on what kind of event you have, there will be some other properties.

Juan Sequeda:
But is this a natural, again, I’m using this word natural, the natural way of thinking about things? Because, I mean, you always give an example where you have a customer, places an order, order has an order line, order line has a product, an order was shipped to an [inaudible 00:30:55]. So that’s how we all think about, that’s how we’re drawing the whiteboard. You take your business users. They understand that.

Jans Aasman:
Well, so in a call center, you’re trying to sell… We had to decide what is the entity that we really care about. Well, that’s the customer you’re trying to sell to. So a call center might work for big clients like Oracle, Cisco, Amazon trying to sell cloud services. They do campaigns that last a certain amount of time where you’re trying to sell something. And then ultimately sales agents sell something to the end users that you want to sell to. Ultimately, we decide that the end user is the one you want to everything know about. So, that is the core entity. And then every interaction with that customer, whether you sell something, whether you got an appointment with them, whatever interaction you can imagine, that became the event.

Jans Aasman:
We start with that and then there will be again, the 10% that you can’t chart. And we put that into a knowledge base and we federate these event charts with these knowledge bases. And that model works for call center. It works for a bank that we work with. It worked in healthcare. Right now, I’m working with the FAA. And we’re looking at maintenance and incidents and all kinds of other things that happen to aircraft. Easy. The aircraft is the core entity and then every maintenance, every incident, every repair, every inventory with respect to an aircraft becomes an event, can be attached to the core entity. And again, I can do prediction of behavior. So in every application that I see that our customers work, the only thing people care about is what is my entity going to do next? And what’s going to happen to my entity?

Tim Gasper:
So what has it done? What is it doing? What will it do?

Jans Aasman:
Can I understand the behavior? Can I classify the behavior? Can I really understand what is the causality in this? I mean, everything human beings do is trying to figure out, okay, why? What is the cause? What causes what? And so if you can’t look at your data as a series of events, you never can say anything about causality. You can only say something about maybe correlation.

Tim Gasper:
So entity event modeling is something that I’m familiar with at a high level, but not deeply. So I appreciate you going in into it more and showing off that it can go into different industries and different use cases. One of the things that I wonder, so my background, obviously I’m learning about and getting stronger in graph and knowledge graph and things like that. But my background’s much more in the relational and then the big data world. Is entity event modeling, does it have applicability or [inaudible 00:34:05] areas in the relational world too? And what would that be? Do you look at things like time series databases or that’s not necessarily relational like event modeling? You think about CDPs and trying to do marketing in customer oriented events and things like that in a data warehouse. Are those concepts similar or are they different? I’m curious, for the more relationally minded people, how do you compare those things?

Jans Aasman:
Yeah. There’s a wonderful article that I found on the internet that I sometimes show in talks about the difference between regular modeling and event based modeling in relational databases. You can do the same thing in relational database. It’s nothing special actually. And the person that describes it says, well, if you do the regular modeling, then you always know what the state of the system is, but you don’t know how you got there. And if you use an event approach, then you know how you got there. But if you want to know the current state, then you might have to do a lot of extra competition to actually figure out what is my current state, given everything that happened. So if you do that approach, then you have to do a lot of work to classify the current state of your patient, customer or whatever else. So, that’s the trade off.

Juan Sequeda:
So counting is super hard. So I want to count how many patients I have. It’s easier in one and harder the other or?

Jans Aasman:
Yeah, traditional relational database can instantly tell you the state of your inventory. But if you were to only do through everything coming in and going out, then you actually would have to count unless you spend a little bit of extra time with the trigger or anything else to actually compute how much you actually have right now. Does that make sense?

Juan Sequeda:
[crosstalk 00:35:59].

Jans Aasman:
So it’s just two different approaches. But if you are interested not in the current state, but what is my entity going to do next? Or what’s going to happen to my entity? Well, then you better keep everything that happens so far. Now, Juan, you brought up another interesting topic, because in time series databases… Well, time DB and other… So this is another thing that I see as a future of knowledge graphs and something we’re also working on, because if you have thousands of events per entity and then doing a sparkle query, like find someone that first had this diabetes and then he got this particular procedure and then you use this particular medication and then you went really bad, now you have to look over two million patients trying to do this series. And that really is very slow.

Jans Aasman:
So you really have to do and chart your data and create specialized indices for temporal data, otherwise you can’t do your queries really well, but even then… So, that is the one end. That is the problem with graph databases and temporal data, that if you put it all in one big graph database, then you do very complex temporal queries, then you better make sure that you optimize your queries for that and your technology for that. And then you have the people that say, “Well, why don’t I put everything in time DB, in a temporal database?” The problem with temporal database is actually just one big matrix. It’s just like a [inaudible 00:37:32], but then in a database. And certain things work really well. But the problem is I could look at the series of events for one patient, but that one patient shares the same doctor, the department where he was, the medications, the drugs, the medical treatments. So even if it’s a long series of events, it’s still a very, very complex graph because of all the interconnections that you have. Does that make sense?

Juan Sequeda:
Yeah, no, no, you’re making me think and connect a lot of the dots of a past conversation we’ve had with Emil, the CEO from Neo4j couple weeks ago here in the podcast. I’m going to hold that question for our lightning round. But I want to shift a little bit gears and go back into how do we get started? So for folks who are not thinking about data model with-

Jans Aasman:
Started with what?

Juan Sequeda:
With data modeling. I think that we’re always doing some sort of data modeling. It’s just not explicit or it’s not a first class citizen. Let me start with that. I’m going to say that I’ve observed that a lot. People will eventually do it, it’s implicit in some stuff. Do you agree with that or you think that it always does happen explicitly, but just a small amount of people know about it? Wat are your thoughts?

Jans Aasman:
Well, as soon as you put anything in any kind of database, you do data modeling. The thing is-

Juan Sequeda:
Do they even know what they’re doing? Because this is-

Jans Aasman:
Okay. So I recently wrote… Well, I was interviewed by someone and I talked about that we never get phone calls from people under 35. And then the interviewer made it look like I didn’t like to work with young people, but I mean, that’s the most wonderful thing in my life is to actually work with young people. But if you ask someone in an enterprise, a group of programmers, “Hey, I need to solve this problem. Solve it for me.” Then people say, “Oh, what is the coolest graph database on the planet right now?” “Oh, that is this one.” All right. So, where’s the data?” “Oh, let me just put the data from here and there in this particular graph database and within three months have something that solves the problem.”

Jans Aasman:
Everyone happy. The programmer, he delivered on time gets a raise. The manager happy because his manager asked him to solve a problem and now he solved it. But the issue is no one thought in this entire process about, oh, but how does this something for the entire enterprise? Let me say it different, there was no data centric thinking anywhere around the process, there wasn’t. But if you believe in data centric processing or a data centric architecture for your company, you better think really, really hard about your data model. So again, if you don’t care about data-centric modeling or data-centric architecture, then you could whatever you want, it doesn’t matter. You can solve the problem.

Jans Aasman:
Any young programmer, and all programmers too, by the way, can solve a problem. It’s just fun. There’s nice programming, a little bit of hacking and everyone happy. But if you want to do the data centric approach, then you get people. I see Dean here, well, I see some other people here. I mean, then data modeling becomes almost the most important thing. Let’s fix it for the entire enterprise for every new application that ever going to need. I want to have a data model that really works. Does that makes sense?

Juan Sequeda:
Yeah.

Jans Aasman:
That’s the core. So yeah. I mean, but I see everything. I see the idea.

Juan Sequeda:
Yeah. Actually, so Rodney here is now pointing out that if you focus just on the applications and then it ends up being this point to point solutions, I created this application and eventually to go talk to this other one, you’re just doing all these point to point things. And I think also if your goal is to just focus on this one particular problem, and I just want to go solve it, then yeah, you’re probably not going to think about that sophisticated modeling and how that’s going to be able to go reused for all the use cases and stuff. But if you really want to start thinking for your organization, if I’m going to put some energy and time and spend money on creating data and I want that to be reusable, I need to start talking to other people to figure out how are they thinking about the world? How could this possibly be reused?

Juan Sequeda:
So it’s this balance, what I always talk about of, I need to be efficient, but at the same time, I want to go do work that can enable a resilient organization knowing that I can support known use cases, but also hopefully unknown use cases I have no idea about. And I think if you are focused, if you want to be resilient in your organization, you should really be thinking about data modeling from the beginning. But there’s also a balance that you don’t want to go boil the ocean and then just spend all the time doing data models, because we see that time and time again, it’s like, well, you just spend a lot of time and all you have to go show is these really complicated boxes in lines and this UML, a big sheet of paper and [crosstalk 00:42:57]-

Tim Gasper:
Yeah. How do we avoid this becoming just this effort to achieve perfection and the sort of never getting there? Because we know that’s an impossible destination. I mean, obviously graph helps, but how do we approach this graph or not?

Jans Aasman:
Well, I think this has been figured out for healthcare, I think. My colleague [Percy Mihalji 00:43:36] in Montefiore came of this event model for healthcare and that is a…. I mean, if I look at that list of all the particular kinds of events you can have, you could maybe even more events, but that’s it. That’s a series of things, but for a bank, then… Well, I mean, you should really invite someone from Wells Fargo or Dean here and talk about… Has Dean already been here?

Juan Sequeda:
Yeah. So we’ve actually had Dean as a guest before. We’ve actually had Dave McComb also as a guest [crosstalk 00:44:17].

Jans Aasman:
You should find David Newman, because he’s trying to do this for the bank. Okay, what are the core objects independent of any IT system that I have? What are the core objects for my bank? And then he starts with that. And you don’t have to fill in every attribute. As long as you’ve got 80% of that attribute right, you got a good start. But I mean, Tim, it’s a good question. It’s always the balance between a total perfect model, but you didn’t know what you were going to use it for to this quick hack that solved the problem. So, that’s the difficulty.

Juan Sequeda:
Here’s a LinkedIn user, which I’m going to guess it’s Mark Kitson because he has some privacy issues on his LinkedIn account that’s why the name doesn’t show up when we see it here, but he just said perfection per use case. There is no universal perfection unless you’ve boiled the ocean. I agree with that comment.

Jans Aasman:
So, but does that mean that you disagree with me? And if so, what is it which you would disagree with?

Juan Sequeda:
All right. Mark or LinkedIn user, if you’re listening, here’s the answer. But hey, so let’s take it to the next segment here, our lightning rounds. So we got some yes or no questions for you.

Jans Aasman:
Oh, it’s already 2:45. [inaudible 00:45:44].

Juan Sequeda:
We could keep talking here for a long time. So, all right. Knowledge graphs will make data modeling become sexy in a topic of conversation. Yes or no.

Jans Aasman:
Yes.

Juan Sequeda:
All right.

Tim Gasper:
Here’s the next one. Data modeling will become something that business users can do. Yes or no.

Jans Aasman:
Yes.

Juan Sequeda:
So another one, Emil from Neo4j who was on the podcast a couple weeks ago mentioned about time series, new SQL, graphs and document as the four main database types. Will this entity event model and the data centric architecture overlay on all of that?

Jans Aasman:
Yep.

Juan Sequeda:
Wow. Very, very specific.

Tim Gasper:
I want to break the lightning around here for a second and say can you add a sentence to that?

Jans Aasman:
Emil is quoting from a Gartner report that… Well, not quoting, I mean, he probably helped him influence though, but there was some very interesting Gartner article where someone argued that business always go to flexibility. That’s why [inaudible 00:47:04] would be with such a great success. That’s probably where graph databases… And he said, “Ultimately, the only databases left standing in 50 years time will be something like a graph database and a document database. And because time is so important, a time database.” That’s it. Yeah. But if you think that evolution is going along the line of flexibility, then that would be the ultimate answer. Relational databases are not the answer to flexibility. You probably all agree. And so a combination of time series, document, and graphs is probably the kind of database you’re going to see.

Tim Gasper:
That goes the three key tools you need and then layer on your knowledge graph on top, right?

Jans Aasman:
Yeah. Yeah.

Juan Sequeda:
This is making me… We need to have another discussion or another topic on these multi-model databases that apparently say they can do all that. We need to go figure that one out. That’s going to be an interesting, honest and obvious discussion on. Do we need to have four different databases or just one that rules them all?

Jans Aasman:
Document and graph is easy, but document and time, that’s going to be a little bit more harder work.

Tim Gasper:
Yeah. Some of these things are a little-

Jans Aasman:
Sorry, graph and temporal, that’s hard work.

Tim Gasper:
… a little harder to reconcile some of these things.

Juan Sequeda:
Tim, you broke the lighting [crosstalk 00:48:24].

Tim Gasper:
That’s okay. All right. Last question.

Juan Sequeda:
Last one.

Tim Gasper:
All right. Every data team should be talking more about their data model. Yes or no.

Jans Aasman:
Yes.

Tim Gasper:
That was an easy one, but we want to reinforce it. Data teams. Talk about your data model.

Juan Sequeda:
All right. So Tim and I, we always do our takeaways and we got a lot of notes here. So TTT, Tim, Take it away with Takeaways.

Tim Gasper:
Take it away. Here we go. Let’s take it away. All right. So I like that you said modeling is human problem solving. I think that sometimes people think of modeling as some sort of a technical challenge or something that only very special people can talk about and implement. And I think thinking of modeling as human problem solving is a great framework. I look forward to using that as we go forward here. And I really like that you talked about, don’t start with top down logical models instead, really take a bottoms-up approach. And you said, what are the top 10 question you want to answer? And how do you start in a specific domain and work your way across that way? And I think those are really good points. I mean, we’ve talked about not boiling the ocean in the context of analytics and the context of governance, but it applies to the world of modeling in much the same way. And it’s a very important approach. It’s amazing how many legs that phrasing really has. What about you, Juan?

Juan Sequeda:
Well, so the main, main takeaway is entity event modeling. This is something that we really need to start thinking more about. At the end of the day, our goal is we want to understand the behavior of that entity, that thing that my business cares the most about. And that’s not just something static, that’s something about everything that’s happened about it, because I want to be able to predict that behavior. And then another interesting takeaways is 90-10 rule, you said that 90% of the [inaudible 00:50:31] you can go work with the entity event approach. And the rest is around the knowledge and you can have this other knowledge base around it. So that was a very interesting comment there that actually I need to start looking more into that and see… I trust that you’re right. It’s 90%. I just never thought about it that way. I wanted to start thinking about it more.

Juan Sequeda:
And the other one is what we were saying, connecting it with the level that I talk about the efficiency versus resilience about, Hey, if your goal is just to go something very quickly, you’ll probably… I mean, first of all, you always do data modeling. I mean, if you’re working with data, you always do some sort of data modeling. But if you’re just doing something very efficient, very quickly, go solve the problem, you’re not spending any energy and you’re not focusing on that data model and okay, you solve that problem, but that’s not going to be easily reusable.

Juan Sequeda:
But if your goal is to be resilient, your goal is to be able to go invest, make sure that I have good ROI on my investments of the data I’m doing, you need to start thinking about data modeling because that’s how you’re really going to be thinking of generating a resilient infrastructure and a resilient organization. So I think if you are spending time on data modeling, you are on the path to be resilient. If you are not spending time on data modeling, you’re being efficient. It’s not a bad thing, but you’re probably not being resilient. And depending who you are or what type of organization you are, that’s the balance that you’re trying to go figure out. So that was very interesting takeaway for me. And I’m throwing it back to you, Jans. So our final segment of advice, one, what’s your advice about data, life, whatever? And second, who should we invite next?

Jans Aasman:
How many questions was that?

Juan Sequeda:
What’s your advice? Who should we invite next?

Jans Aasman:
My advice is if you are in a particular domain and you’re overwhelmed with information that is extremely diverse then think about building a knowledge graph. Can I add a little bit of column to that?

Juan Sequeda:
Please.

Jans Aasman:
I talked this weekend to two different people that are in completely different domains that both never had heard of a knowledge graph by the way. So we all think in this audience, so knowledge graph, knowledge graph, but I mean, most people have no clue what you talk about. One was then looking at climate change and how can you create a new world where people divest their money from oil and gas companies and put it into other things? How do it in such a way that it becomes easily accepted? How investors will accept it. And I talked about, so where do you get your knowledge? Where do you store successful policies and all of that. And she said, “Well, everywhere. And it’s so hard to find.” And so I had to explain what the knowledge graph was.

Jans Aasman:
And another guy talked about… He’s doing economics research in psychedelics in healthcare. How can you use psychedelics instead of other medications? And again, I have 12 spreadsheets where he’s trying to keep things together. He said, “Jans, I can’t do it anymore.” And so I had to give a whole lecture about knowledge graphs to him and how I could help him with that. So it was a fun discussion, but yeah, I see so many places now where a knowledge graph would help. Both cases by the way, were not really entity event models. Those were more like steady knowledge, whereas what and how, more knowledge basis. And then my advice, God, I’m really, really deeply concerned about the political divide in this world. And so always I have to think about Dalai Lama, kindness is my religion. So please be kind.

Juan Sequeda:
I love that. Please be kind. And who should we invite next?

Jans Aasman:
David Newman from Wells Fargo maybe, or Percy Mihalji Montefiore Hospital or Shannon Copeland, she is in the world of call centers, but a big, big fan of knowledge graphs. And all of three of them are in a particular domain and really committed to the world of knowledge graphs. But again, there’s so many people in our domain now that you can talk to.

Juan Sequeda:
That’s true.

Jans Aasman:
I think David Newman would probably be the best, if Dean didn’t already cover everything that David is doing.

Juan Sequeda:
Well, Jans, this has been a fantastic conversation and philosophical, practical in so many different aspects. Thank you so much. We really appreciate it. Cheers [crosstalk 00:55:25]. Quick reminder, the data.world summit is next week, is September 29th. It’s a free virtual event with awesome presenters, including us, Tim and I will be there. Our agenda has a lot of the folks who have actually been on the podcast. So [Sham Digani 00:55:40], Dean [Alerman 00:55:42], Bar Moses, Doug Laney, who will be a guest pretty soon. We’re going to be talking about data mesh, data ops, data product managers, data governance, knowledge graphs. And if you enjoy cataloging cocktails, you’re going to enjoy our data.world summit, which is also a honest, no BS approach of how we do our summit. Cheers.

Jans Aasman:
All right. Cheers.

Juan Sequeda:
Thanks for your time. And to data modeling, cheers.

Jans Aasman:
All right.

Tim Gasper:
And be kind.

Jans Aasman:
Yes. And be kind.

Juan Sequeda:
And be kind.

Jans Aasman:
And hope to see you in Austin in January.

Juan Sequeda:
Hopefully, yes.

Jans Aasman:
Hopefully. All right.

Enter Content Here.

M

See the catalog for data discovery, governance, access, and analysis.

Request a demo