About this episode
Investing in Knowledge Graph provides higher accuracy for LLM-powered question-answering systems. That's the conclusion of the latest research that Juan Sequeda, Dean Allemang and Bryon Jacob have recently presented. In this episode, they dive into the details of this research and understand why - to succeed in this AI world - enterprises must treat the business context and semantics as a first-class citizen.
Tim Gasper [00:00:00] Welcome. It's time for a very special episode of Catalog& Cocktails presented by Data. world, your honest no BS, non- salesy conversation about enterprise data management. We have a very non- salesy thing to talk to you about today. Something very interesting around technology innovation. It's around AI and it's around actually some research that we've been doing and we've got some awesome guests here where we are seeing 3x better accuracy around large language models around gen AI by using knowledge graphs. And so I'm Tim Gasper, longtime product guy, customer guy at Data. world, joined by Juan Sequeda. Hey, Juan.
Juan Sequeda [00:00:38] Tim, how are you doing? It's a pleasure. We're doing this, we're recording this on Wednesday, the day before Thanksgiving. We're not live today, but hey, we need to take a break, but we're still recording this on a podcast. Anyways, so I'm super excited that we've been have this possibility to go share the research that we've been doing and my partner in crime here has been too, Dean Alaman and Brian Jacob. Brian couldn't join us today because he's really on vacation and Dean is joining us. Dean is, I've known Dean my entire career. He's the author of the very important book Semantic Web for the Working Ontologists and has been the semantic web and knowledge graph pioneer for decades. Dean, it is my pleasure to introduce you and have been working with you for so long. How are you doing?
Dean Alaman [00:01:27] Great, and it's great to be here and it's been a while since I've been on Catalog& Cocktails and I'm really excited to be here because the work that we've been doing well is really exciting. I think it's some of the most important work that I've done possibly in my whole career. This is really important stuff.
Tim Gasper [00:01:45] Yeah, this is really pushing the envelope here, cutting edge, and it's just trying to push the state- of- the art forward, which I think is incredible here.
Juan Sequeda [00:01:56] And not only that, it's really trying to understand what's actually happening, understand what's going through the hype and figuring out the noise, the signal. And truly it's like we're trying to be honest and no BS about this stuff. And this is science and hence we're not salesy at all. So I want give some feedback, some background about the research. And the plan here is that we actually are going to go through some questions that we got through LinkedIn. So in a nutshell, get some background. LLMs have changed the world in last year basically. And obviously one of the cool applications people are talking about is chatting with your data, being able to go literally ask questions and have that interaction just like we're doing with ChatGPT. But the entire infrastructure of work, people are setting things up is mainly around text, PDFs and stuff. And when it comes to your structured relational, your SQL databases that has not been at the center. And when it does, those conversations do come up, it's all about the Text- to-SQL and how we can take in that question and generate SQL queries over that. Which I recall everybody Text- to- SQL has been an area people have been studying for decades, and just question answering itself in the area of computer science is probably half a century of work. So this isn't new, but LLMs have really accelerated. But what happens is that it's very obvious that people are looking at these examples and it's like, " But that's an easy question over easy data." And I think there's that disconnect. So what happens is that we've been asking ourselves, how does it look like to do these question answering this natural language over your relational databases, but in an enterprise setting? So that is really the question. We asked ourselves two research questions. The number one is to what extent these large language models can accurately answer these natural language questions, enterprise questions over enterprise SQL databases? And when I mean enterprise questions, there are questions from day- to- day reporting all the way to metrics and KPIs. And when I talk about enterprise SQL databases, I'm talking about stuff that represents 100s of tables at a particular domain.
Tim Gasper [00:04:11] So these are the kinds of questions that you'd might ask in a report when you want to create a dashboard or something like that?
Juan Sequeda [00:04:17] Exactly. And then things that later on, you may not be able to go answer for a report, but you need some extra information out of that but the report is there.
Dean Alaman [00:04:24] Now a clarification, Juan, that I'd like to bring in. I was at a content management conference two weeks ago, and one of the things that someone from content management is going to think about is I'm going to ask an LLM a question, it will give me an answer. And I want to point out in the architecture that Juan has been describing with chat with your data and Text-to- SQL, he's putting an extra step in between. You're asking a question that's being turned into a query that gets you the answer. Now to those of us who are at Catalog& Cocktails and used the data and all this stuff, we might consider that obvious. When you're at a content management conference that's not obvious. The fact that a query is always something that you can turn to feels a little bit techie, a little bit codey to a content manager, but it's really essential to understanding how it is that what content managers do, which is cataloging, can actually really improve how this goes. So I wanted to call that out. I know for this audience they're going to say, " Yeah, sure, Text- to- SQL, Text-to- SPARQL. We expect to create queries because it's the data audience. But in case we have any folks in the audience who aren't data oriented folks like content managers, I just wanted to point that out. Okay, so back to you Juan.
Juan Sequeda [00:05:38] That's an excellent point I think because I think also it helps us to get outside of our bubble and also a podcast to people.
Dean Alaman [00:05:43] That conference was an eyeopener for me. I gave a keynote and I got a lot of great feedback, but I understood that it was a very different, well, many of the people in the room, it's a mixed room. Many of the people in the room came at it from a very different perspective. And this helps me understand a lot of the questions that came in on LinkedIn as well. They're coming from a different perspective. That's a particularly important perspective here. I think the content managers have been doing a lot of the work that's going to be really important in making LLMs work in the future. But I'm bearing the lead here, so I'm going to go back-
Juan Sequeda [00:06:16] No, so what happened was around four or five months, whenever, when we were in Vegas for the Snowflake at the Snowflake Summit. There's a big launch of LLMs and Snowflake and all that stuff. And then I'm starting to talk to a lot of people, users, customers, and then product folks at Snowflake. And they're like, " Look, we get it, semantics, slot knowledge graphs, this is needed. But one, it's not clear how that would actually work. But the other thing is it's not clear how much it's actually going to improve. We feel it is going to improve, but how much?" So basically a lot of the product folks told us, " Hey, you guys should do a benchmark." And I'm like, " Yeah, we should." And I'm like, " Okay so what happens is that there are existing benchmarks of all these Text-to-SQL and stuff, but one of the things that they're very academic." There are on smaller data sets, they're just questions that are just a laundry list of questions. And I think enterprise folks look at that and they're like, " This is really not representative of what the real world looks like." So hence-
Dean Alaman [00:07:19] Along those lines, Juan, the data sets they use are normalized. You could actually tell what normal form they're in. In my experience, I've never encountered an enterprise data system that comes close to being analyzable in terms of normal forms. It's a little bit of this, a little bit of that. And oh yeah, we patched that on somewhere. And it's that wild crazy space that to my mind is the real difference that's doing enterprise data set from the ones that you see in the textbooks.
Juan Sequeda [00:07:49] So to that, the second question that we're asking ourselves is to what extent knowledge graphs actually improve that accuracy? And our hypothesis here is that if you have this large language model powered question answering system, chat with a data system, and it answers these enterprise questions over a knowledge graph representation of your SQL database, and you compare that with a large language model question answering system that just asks the question directly over SQL without a knowledge graph, that the accuracy of the system that uses a knowledge graph is going to be higher than the one that it doesn't. Now the question remember is to what extent how much higher? That's what we don't know. So what we did is that we defined this benchmark. It really has four parts into it. One is this enterprise SQL schema that we're using. It's actually comes from the OMG, a standards organization around property casualty data insurance. So that's a real world insurance model has been done. We came up with a set of questions on two spectrums. One is on the spectrum of how complex a question is, easy questions being reporting, " Hey, show me how many claims there are." All the way to metrics and KPIs. " Hey, what is the total loss or what is the loss ratio?" And so forth. And then also you want to have a complexity on the number on the amount of the databases. So do I need a couple of tables or do I need 6, 7, 8, 9 tables to go answer this? And then third was to have the actual context layer explicit. To make the metadata of the semantics explicit around this. So what is the ontology? What is that semantic layer? What are the mappings that transforms that, take that source of the target, make this very explicit so we can build the knowledge graph with that? And then finally, there's a lot of the accuracy that we reuse from the existing benchmarks. It's namely the Spider benchmark that comes from the folks at Yale University. So that's the framework that we have for the benchmark. And the results is, oh, one more thing. The system that we did was very simple on purpose because we wanted to define a baseline, a very simple baseline. We used GPT-4 and we used a zero shop prompt. The zero shop prompt specifically is given the SQL DDL. You copy and paste your text of your DDL, write the following question in SQL. And for the knowledge graph one is given this OWL ontology. We're using the semantic web RDF standards here, given this OWL ontology, internal syntax, you copy and paste it, write the SPARQL for the following question. That was it. Very simple on purpose. So just by that, the results is that if the accuracy over all these questions that we did combined was 16%, if you don't use a knowledge graph. And that increased to 54% if you use a knowledge graph. That's your 3X. And then if you have these different quadrants, if you think about easy questions over easy data, that was 25% without knowledge graph and 70% with the knowledge graph. If you have more complicated questions on easy data, smaller number of tables, that was 37% with SQL, no knowledge graph and 67% with the knowledge graph. And then the moment you get into more complicated schemas, meaning actually if you required more than five tables, the accuracy without a knowledge graph was zero. And then with the knowledge graph, it was 35, 38%. So this is already, these results for me support the following claim, which is investing in knowledge graphs is going to improve the accuracy for your large language model for any question answering systems. And basically look, to really succeed in this AI world that we're in now, you must treat your business context, your semantics, your meta first class citizens. Knowledge graphs are a requirement for generative AI. That's the evidence we have and that's what we shared last week. And I think what we wanted to go do is that I shared this post and data on LinkedIn and we're truly amazed with how this has been perceived. I think between, there's over, I've just been blown away and so many comments.
Dean Alaman [00:12:02] It's more comments and reactions than any LinkedIn post I've ever made. So yeah, it's catching on.
Juan Sequeda [00:12:08] Tim, you're going to help us drive the next section here.
Tim Gasper [00:12:11] Yes. Well, and before we move forward, I want to make sure that our audience here really understands the implications of this. And it's no surprise to me that this was very popular on LinkedIn because everybody sees what's happening with ChatGPT and with LLMs. And of course if you work in an organization, your thought process is, "Well, I want to use LLMs to tap into the knowledge and the data that we have in my own organization. And how can we start to do that and is it going to actually be helpful?" So it's no wonder that there's a lot of interest here in terms of leveraging this or taking this to the next level. Just to make sure that we're clear here on the results. What this means is that for both easy questions and hard questions. For easy schemas and complex schemas, providing and leveraging a knowledge graph makes the LLM create more accurate queries. And it is especially impactful for the most complex queries, the most complex questions and the most complex schemas. Right?
Juan Sequeda [00:13:19] Definitely.
Tim Gasper [00:13:20] And in the case of complex questions, complex schemas, the Text- to- SQL didn't work at all.
Juan Sequeda [00:13:28] Exactly.
Tim Gasper [00:13:29] Yep. I think that's pretty incredible. So as we start to unpack this a little bit and what this means, I have one question for you guys. What's the approach that you took here in terms of openness and transparency. Can others extend this research? Can they unpack it? Could they leverage this benchmark as well?
Dean Alaman [00:13:49] Absolutely. That's actually the point at the very beginning. It was as we go through, we'll talk about this. But the vision that Juan and I had when we were putting this together is that we would do very like what Spider, the Spider folks at Yale have done to build this benchmark, publish all the pieces of it, have our experiments or redid the results, but now reimagine that somebody from the Text-to- SQL world would come in and say, " No, the knowledge isn't the most useful thing. There's actually something you could do with an advanced ERD diagram." And in fact, we got exactly such a comment back and we said, " Great, an advanced EER diagram would help a lot. Prove it. Here's the data, here's our results, your turn." And we're not sitting here being defensive. No, this is how we as an industry, we as a scientific community are going to figure out what on Earth is going on here. Does knowledge have to be an owl? I doubt that. Could knowledge be encoded in an ER diagram? Probably, but let's have someone try that. Or how about plain English? I'm just going to write a paragraph in plain English and that's going to work just as well. That might be true, I don't know. But we really want to set up a community resource so that questions like that have empirical answers, not just speculation answers.
Tim Gasper [00:15:16] Yeah, the plain text is especially interesting. I know a lot of comments are wondering about, " Hey, how can prompt engineering and passing other types of context actually take to this the next level?" Which I think is some of where y'all are thinking of taking this next, right?
Dean Alaman [00:15:31] And this is the thing that you can do empirically. It's no longer just, " Hey, I got a better idea and I'm going to go out and make my product and say that it's better." It's like, well, we can actually as a group test what's better and then your better product is the one that finds the sweet spot.
Tim Gasper [00:15:49] Right. No, this is great. It's all about how we can push the knowledge of the entire data community forward? So we're actually going to leverage questions that were posted on LinkedIn and through social networks here to drive the next part of this podcast interview here. And let's start actually with Eunice Bragg who asked some really good questions around architecture for this. So Eunice said, " I'm currently working on the same topic to use knowledge graphs to improve the reasoning of LLM and would like to ask a couple questions. So first of all, the way to feed knowledge graph into LLMs does use any specific tokenization technique or use top layer on the LLM to extract content such as knowledge extraction, deep KG LLM framework." So that's a little bit deep technical here, but I'm sure you can unpack that. " Or does it use RAG," which is another architecture here, " Or applying prompt engineering, which takes knowledge graph as a schema passed as the prompt?" So maybe you could unpack these terms a little bit and what approach was taken here.
Juan Sequeda [00:16:59] So as we said, as I was mentioning, we start on purpose with the most simple prompt engineer. So you can imagine everything that we're doing here is a form of RAG, But I think people are-
Tim Gasper [00:17:13] Retrieval augmented generation?
Juan Sequeda [00:17:15] Yeah, exactly. Retrieval augmented generation. I think right now RAG is synonymous to like, " Oh, there's a vector database included." So there's no vector database included yet in these types of what we've done here. But I think as Dean was saying, we want other people to go test up their approaches. So in this case, the RAG, the retrieval that you're doing is actually passing in already that context of the schema or the ontology inside of the prompt. So what we did was just prompt engineering. And as I mentioned, it's this very simple zero shot prompt. And again, that was done on purpose to figure out the baseline. We wanted the most simplest thing so we can know how we can improve it. So now everything that Eunice is bringing up, all of that should be tested. Should we have a very specific type of LLM that is specialized on knowledge graphs? Yes, somebody test that. There's existing new SQL foundational models, that should be tested too. We also brought up that what happens if your context layer is too big that it doesn't fit inside of your context window, I'm using the context word twice. Your semantics, your metadata, your ontology, that's too big. It's not going to fit inside your context window. So you need to go be able to pull it out from somewhere else and you're probably going to pull it out from a vector database. How are you going to figure out what parts of that? That needs to be figured out too. Again, we only did a zero shot prompt. Can we improve that prompt? All of that needs to be tested. So at this moment we again, the very basic prompt on purpose to figure out the baseline, and now we can start seeing people saying, " Hey, I ran this benchmark again with this different type of prompt and this is what we should be explicit about and we're going to see how this accuracy can improve."
Dean Alaman [00:19:01] One way to think about this is that this is an actual scientific experiment. You control out certain variables to make sure that those aren't impacting your measurements. And in fact, the three variables that Eunice talks about here are three of them. We quite explicitly factored out of this. Because you don't want them to say, " Well gee, you had a better prompt for this one than that, or you had a better LLM for this one than that, or this one used some retrieval to do something." We know those things will make it better and that's one of the reasons why the actual results are quite modest from a production point of view. Anybody back in May, I was playing around with this stuff and I got far better results than 50%. Why? Because I was playing around with all these variables and a whole bunch of more, and not only did I do that, basically I was at a conference in May and I saw two demos where people did that. And I was looking at a message from my colleague, Brian Jacob, the third author on this paper where he was doing that. Yeah. Yes, you can obviously do better than this. We controlled out these variables on purpose. I know you said this before, but it's really, I think the thing to understand about this work, we controlled out these variables and said, " This one variable has a three to one. Invite you to control out everything and a better ERD has a how much?" I don't know, do the experiment. A better prompt engineering could do and so on. And then the real interest, and I am repeating myself, it's really important, how do we combine these together to find the sweet spot that gives you the best interaction of all of them? And so yeah, these are three that we thought of. We did not do them, and that was on purpose. We invite Eunice to go to our Git repository, pull down the data, put it into their system, so they're already working on it. Put this in the system and tell us how well do these things work at improving performance?
Tim Gasper [00:20:52] Yeah, this is the scientific collaborative approach. And I know the ultimate goal is we want to get to 100% right? Because if it doesn't know, we don't want it to hallucinate. That's the wrong thing. So to put another way to think about this, Douglas Moore asked is another way to think of this actually, that the LLM, after using the knowledge graph, or let's just say it this way, the LLM is three times better at Text- to- SPARQL than it is to Text- to- SQL?
Dean Alaman [00:21:24] You tipped our hand, Tim, when you misread that. The real key here to rewrite that so that the answer is yes, is the LLM along with a knowledge level is three times better. Is an LLM three times better at Text-to- SPARQL than Text-to- SQL? I've not done that experiment. If you did the exact same query and SPARQL and SQL, I would actually be surprised if you had that much because I think it's the knowledge where the leverage is coming from. So obviously, yes, Douglas, you're right, but with the clarification that Tim put in here that the LLM equipped with knowledge described in our case in OWL is three times better than Text-to-SQL.
Tim Gasper [00:22:08] I accidentally even added in the nuance that makes it a yes question.
Dean Alaman [00:22:11] Yeah, you did. And how could you help but do it? You tipped your hand there, Tim. And we know that people are biased and that's why you design experiments that try to get the bias out. And no one can ever get all the bias out. Maybe even just reading the question, your bias came up. And that's just human. That's not, " Hey Tim, you made a mistake." That's just the way people work. That's why we do scientific methods.
Tim Gasper [00:22:32] Well, it might actually be very interesting to see because I've heard people especially who appreciate SPARQL that say things like, " SPARQL is a little bit closer to natural language than SQL is." And so I actually-
Dean Alaman [00:22:47] I'm one of those people. I would say SPARQL is useful as SQL. And there's the fabulous Lee Dodds fabulous little article teaching RDF to a six- year- old. Yeah, it's easy to teach this stuff to people who haven't been confused by all the SQL stuff, but that's a whole different topic. We won't talk about that today.
Tim Gasper [00:23:06] No worries.
Juan Sequeda [00:23:06] One thing people ask a lot is why is this happening? Well, it's actually hard to explain the why because we don't know what's going on inside of these things. We can always just speculate around this and maybe we can come up with different hypothesis experiments to figure that out. A speculation I have is that because the graph and the way you model things, you make the triple, the English sentence, the subject of predicate of the object, the predicate is that relationship and you're making that explicit. While in SQL for example, if a foreign key represents that relationship and it is implicit, I don't know what the words are. So the way you're making that relationship is by making sure that these two columns actually either they may be the same column name or you have to know that they reference different ones even though they have different columns. So they're not implicit. They implicit. So I suspect that the knowledge graph makes it get higher accuracy because I'm making the language more explicit in how I'm designing the model here, the model in the sense of the data model, while in SQL, it's not explicit.
Dean Alaman [00:24:18] And actually Juan, in JSON, you have the same problem. The jewel problem, you've got the name of the relationship, but no type necessarily of the other end. So there are two things you can have. How am I related to it and what is it? SQL's really good at the, what is it? It's in a certain table. I don't know what it is. JSON's really good at, how am I related to it? It's got a field name. They don't have both. In OWL and other knowledge representation languages, go out of their way to do both.
Juan Sequeda [00:24:46] I never thought about it that way.
Tim Gasper [00:24:48] Interesting. And just to piggyback on what you said, Juan, the example I thought in my head was joining a customer table to an order table. In SQL that just might be join this key to that key, but the actual verb that connects all of that is a customer made this order or purchased this order, right?
Juan Sequeda [00:25:10] Yeah. And then I think things that you can go improve later on is well, how are different ways that people are describing this and so forth? And then you can talk about synonyms and you can add this, this is extra metadata, extra context, you want to go add to it. Context is important.
Tim Gasper [00:25:23] Context is critical. And as we think about this number, this 54. 2% number. Dr. Ju here asks, " Great work. Question, is 54. 2% considered state- of- the- art, is that practical for production usage?"
Juan Sequeda [00:25:47] So a couple of things here. One is if you look at the large body of work from Text- to-SQL and look at the Yale's, the Spider benchmark, they're talking about in their benchmark some accuracies of 90% in stuff. It was like-
Tim Gasper [00:26:03] Very high numbers out there.
Juan Sequeda [00:26:04] High numbers, but we're presenting another experiment, we're saying, " Well if you're only doing SQL, it's really 16 and the knowledge graph is 54." So I think this is with these results, people should be asking the questions of what's really going on? We're providing some different numbers. And science is a social process, so we need more people to start asking more questions about what's going on. So my point is I think the state of the art is just changing. We think it's one thing, but maybe it's another. And we're not in an agreement right now. So I think that whatever has been stated as a state of art that is 90, it's not that. It's lower. We're presenting this number of 54. We have the evidence to provide it, so probably it's 54 with the knowledge graph. Now the follow- up question, is it practical for production usage? This is a fantastic question and something that has come up multiple times with folks is like, " Is this good enough? Is this accurate enough for production?" Well, what I got to ask is what does accurate enough mean? I'm sure everybody's going to be tempted to say, " 100%. It has to be 100% or they can't use it." And so then two things come to mind. One is for questions today, how are they being answered today? And can you stand up and say that's 100% accurate? Do we know? Do you feel comfortable saying that's 100% accurate? I don't know. Maybe you do, but maybe somebody gave you a report and you're trusting that person who gave you that report and does that person know how that was reported?
Tim Gasper [00:27:35] You ask Sally, " Hey Sally, how many sales did we have last month?" And she says, "$ 500, 000."
Juan Sequeda [00:27:42] Yeah. Exactly.
Dean Alaman [00:27:43] How did Sally figure that out? This is why we have provenance in data catalogs. This is why we have provenance in all sorts of systems. Gia got this report and maybe Sally made a perfectly sensible computation based upon the wrong table or based-
Juan Sequeda [00:27:58] Around assumption that someone else gave her.
Dean Alaman [00:28:00] Yeah. You have a whole chain of provenance, that is how you get your trust.
Juan Sequeda [00:28:05] So I think this is one of the things you need to go ask ourselves. Now, we're obviously not comfortable. 54% no, this needs to be higher. Remember that it's by different quadrants, it's different. So the other thing that we need to think about is that LLMs, we need to strive for them to be accurate, but also in LLMs and this generative AI technology is making us more productive. So what we really need to go figure out and work on, which is stuff that we're doing next in the lab, is understand what I can't answer a question. So if like, " Hey, I know very high accuracy that this is going to be correct, you can trust this." But there's some other questions where I say, " You know what? I don't know. I'm not going to even tell you an answer. I'm going to tell you I don't know." So I think those are things that we should start looking at and that's how we make the progress about how this can be usable. So again, it's not just always a binary thing. I think also when you to think about it is that you give the element an input, you expect the final output and you're done. No, there's more pieces of the puzzle there when you start thinking about it.
Tim Gasper [00:29:04] Yeah. Dean and Juan, what do you think is just quickly, what do you hypothesize with the things that are going to drive this the fastest towards 100%? Maybe starting with you Dean.
Dean Alaman [00:29:17] There's some low hanging fruit. And Juan and I actually had a meeting about this. And they're basically the same ones that Eunice talked about, or at least two of them are. Well prompt engineering. it's pretty straightforward. And actually one that Eunice didn't mention, I realize now, is multi- shot. So for example, an awful lot of the errors happened because the dialects of football SQLs actually, is this a bit harder, there's lots of different dialects of SQL, even though SPARQL is a standard, some of the utility functions like differences between dates and times are not standard. And often the LLM guess the wrong one. This isn't even hallucinating. It just said, " Well, I know three or four ways to do this. I don't know which one you're doing."
Tim Gasper [00:29:59] This seems popular or something, a common way to do it.
Dean Alaman [00:30:02] So you could prompt engineer that away saying, " Here's my dialect." And giving it some stuff. But now you're filling up your chat window, so you have to get to some RAG to be able to deal with that. Or in that particular case, both of the query engines we're using gave very detailed error messages back. Both of them said, " You said date dif, did you mean XSD: diff date?" Or whatever the one that it uses is. And that's actually pretty good at figuring out what you meant by it, and that kind of sentence for a second shot, that actually might even close the gap between SQL and SPARQL in our examples. Because SQL actually suffered from this more because it's not standardized. And so this helps you get over the fact that it's not standardized. This is why we're doing experiments. I just speculated something. I have no idea really if that's right or not. But that's clearly going to bring both numbers way up. And then prompt engineering do some more. So Eunice talked about a handful of these things. Those are what I consider the low hanging fruit. We might have to start getting some real combinations of these things where you fine- tune an LLM. Personally, I think fine- tuning an LLM is probably going to have less bang for the buck than the other ones. But again, I'm really just speculating here. But my lowest hanging fruit ones are multi- shot, prompt engineering and then RAG is not going to be better accuracy, it's going to be better scale. What happens when your enterprise data set is really large, you need. If I were to bring in an ontology act 5-0, the financial industry business ontology, I can't fit that into a single prompt window. Even the new expanded prompt windows that came out a couple of weeks ago, I was trying to squeeze 5- 0 down into a small enough piece.
Tim Gasper [00:31:52] There's only so much context that you can shove into the window.
Dean Alaman [00:31:54] Only so much. And that's where you've got to go to some technique like RAG.
Juan Sequeda [00:31:58] And even though you don't know, if you add more context, more space, like how accurate that's going to change-
Dean Alaman [00:32:03] Can it actually use it? Yeah. I know that the way me as a human, if you say, " Oh sure, Dean here is boom an encyclopedia now in the same amount of time answer the questions. It's like, " Ah, could you possibly give me a better index on this encyclopedia?" And that's exactly what RAG does. So yeah, there's going to be a diminished returns at some point. But to my mind, multi- shot and prompt engineering are probably your two lowest hanging field. First of all, they're very easy to try, but I also think they're going to be a good bang for your buck.
Juan Sequeda [00:32:36] I think also some just good old- fashioned static analysis, there's a lot of just basic techniques that compare computational techniques to be used on this. So again, we just have a baseline and now it's time to go improve it. And the entire community should be working on this and go improve this.
Tim Gasper [00:32:52] Love It. Juan, anything you'd add to what Dean said?
Juan Sequeda [00:32:55] Nope, that was it.
Tim Gasper [00:32:58] All right. No, I think that's great. I'm looking forward to seeing the next iteration of experimentation here. Ryan Dolly asks, " What's the difference between knowledge graph and semantic layer?" It seems like people are tripping up. A lot of people talking about semantic layer lately. What do you guys think about that?
Juan Sequeda [00:33:14] So my quick one depends on the Dean. So the honest no BS thing here is that the semantic layer is the ontology. That's it. It's an ontology. I think we're using the word semantic layer, it's getting popularized through marketing from folks like DBT, the semantic layer I think coming in from the business objects and stuff, it is really just an ontology, a representation of knowledge in something that is more connected to the business. That's what it is. So I now use the word ontology and semantic layer interchangeably. The knowledge graph is both, it's the semantic layer plus the data and the knowledge graph. That data is represented as a graph. It could be physically represent as a graph or it can be virtualized. And then actually the experiments that we did, it was virtualized, so the data was still in SQL and the knowledge graph was virtualized. So even though we're talking about graphs, that graph was later than translated to SQL. And that's just the standard graph virtualization, semantic graph virtualization technology. All right, so ontology, semantic layer, same shit.
Dean Alaman [00:34:16] Yeah, pretty much. Because I've been using different words for this for ages. At one point I actually think even in this podcast, I used the word knowledge level, which goes all the way back to Newell's seminal work back in the 80s where that's how we talked about it. But it's funny, we're going full circle. We had knowledge level, then we went through semantic nets, knowledge level semantic web in that circle where I've done a lot of my work. And then we started to have the knowledge graph and now we're talking about the knowledge graph being split into a semantic layer and a data layer. And I feel like pretty soon we're going to be back to Alan Newell again and talk about the knowledge level. And actually one of the things that's funny with that history, there was a thing he called the knowledge level hypothesis, which nowadays in some sense, it's what we are testing here. That the hypothesis was there actually is something useful to say above the structure and the content of your information. We're not going to talk about data yet, just information. And that was quite controversial when he wrote about it back in the 70s or 80s. And in some sense here we are 70, 80, 90, 100 how many decades later? And we've got an experiment where we now have a specific definition of what Newell's knowledge level was. And we're actually calling it a semantic layer now as part of a knowledge graph. And we're actually in a position to run an experiment to say, " Not just does the knowledge level exist," which is what the hypothesis was, " But just how useful is it?" So I tend to use the words very much the way Juan is using that the knowledge or the semantic layer is the ontology or ontologies and maybe the mappings. And then the data layer is, well your data put that all together into a system that actually answers questions and that's a knowledge graph.
Tim Gasper [00:36:00] So we're really just tripping up on semantics here?
Dean Alaman [00:36:06] No. Every time someone said that to me, I'd have retired by now.
Tim Gasper [00:36:11] Listeners, keep listening. I'm sorry for the joke. All right, so the next questions I actually want to combine together, I actually think they're related around, so the first question from Patrick Jacqueline is, " Did you have to design and define the ontology first for the knowledge graph?" So we were just talking about the semantic layer, ontology is. Do you have to design it upfront? And then the second question here is actually from Michelle Curie where she asks, " Where does the knowledge come from? The knowledge graph exists outside of any specific technology, it is the knowledge architecture of everything that a dominant encompasses, some of which may never have been captured electronically before. How would an LLM include that which has not been captured electronically? So do you have to design the knowledge? Where does the knowledge come from?"
Juan Sequeda [00:37:02] So the whole point of the study, the conclusion of the study is that by investing in building your knowledge graph, by investing in creating the ontology, by investing in figuring out where that knowledge comes from, you are going to increase the accuracy of these LLMs. So yes, you have to go invest and go build that. Because if you don't invest and you don't build that stuff, won't have the accuracy.
Tim Gasper [00:37:29] Which seems a little obvious almost, right?
Juan Sequeda [00:37:31] Well, yes, because it's extra context, right?
Tim Gasper [00:37:36] Right.
Juan Sequeda [00:37:36] So basically we're saying, " Oh, if I give more context to the LLM, will it do better?" Of course it will. So yeah, all we're doing is building that context. Now where does that context come from? I think the traditional RAG architecture with vector database is just text. But here we're saying we actually invested in giving it much more structure. Giving it the semantics and meaning of this. And we see the accuracy increase. Another experiment as Dean was mentioning before is like maybe we don't do any of the knowledge and we just give it a bunch of more text and maybe that can prove. We don't know. Things we should go try. Now the whole point here is that we really want to go do that investment. The next question is how can I reduce that investment? And guess what? Probably going to use LLMs for that too.
Tim Gasper [00:38:24] How do I make knowledge engineering as fast and easy as possible?
Juan Sequeda [00:38:28] Exactly. So right now, I think the big point I want everybody to take is, damn I need to invest in knowledge graphs. I need to invest in metadata, invest in semantics. That's what I want people to think about. Now, once you get that in your brain, you need to go do that. This work is evidence, you need to go do that. The next question should be, okay, great, let's go invest in it. Can we make it cheaper? Then that's the next step. Yeah, we're going to use LLMs to make that cheaper.
Dean Alaman [00:38:55] So actually just one year ago, and I was looking up while Juan was speaking to see if it was possibly exactly one year ago, and I couldn't quite find the right email. I was doing a proof of concept with one of our customers, building a knowledge graph. And the value proposition one year ago, and it's actually just one year ago was, if you make this investment to build a knowledge layer or semantic layer over top of your data and do the mappings, that you will be able to query your data from the business level, you'll be able to unify data from lots of different resources and build a data mesh on top of this. All of these things are advantages you will get by making this investment and it is therefore worthwhile. And I just love the quote from my customer who actually made that investment. And he said, " At the beginning of this project, I was dating knowledge graph," making this little analogy to romance, says, " Now I'm falling in love with knowledge graph." And that's when he had an actual business rule he wanted to express in any query language. And he'd done all those mappings and the query was, " Well, you want to know the total cost of something X has part Y, Y has cost Z, some Z group by X." That's exactly how it is in the business. But what about all the different kinds of parts and all the different ways the prices are represented and all the different sources I have of this? Yeah, you did that work in the past week. That's what that mapping was for. And bam. Now we thought that was enough value. The whole knowledge graph community and the whole data mesh community thought that value is enough to pay back your effort. Now one year later, that seems like such a drop of value compared to the bathtub of value that we're getting with the chat with your data. And that's in terms of what this experiment is saying, " Just how much more?" Yes, it was pretty good to start with. And it's even better now.
Tim Gasper [00:40:58] Are we going to be able to use LLMs to mine our knowledge better? I know a lot of people are trying to experiment it on unstructured documents and feeding your wiki into LLMs and things like that. Catalogs. Catalogs are a place where a lot of this lives. I don't know if you guys have any thoughts or comments on where this knowledge can be mined or extracted.
Juan Sequeda [00:41:23] So first of all, you already have all the work that you're doing, your technical metadata catalog, all of that is context you want to go use. And then you really want to be able to extract things very quickly from people's heads. So just like ask somebody a question and just bring them just, record it, tech, whatever. And then now I have all that and I can use an LLM to say, " Hey, here's all this input that you have, generate an ontology, a semantic layer for me and let me go present it to, let me go look about it." So I think one of the things that we're working on already is how do we go through these knowledge engineering, ontology engineering tasks and accelerate that using LLMs? That's stuff that we're working on actively right now. I'm really excited about what we're going to go do with it next.
Dean Alaman [00:42:08] And a cookie idea that I think this is going to be really interesting going forward, but we're not really in a place to try it yet. My good friend Elisa Kendall, along with Deb McGinnis wrote a book about two years ago on ontology engineering. And again, this is pre LLM and they are trying to make a methodology of building these ontologies and connecting off the data using a lot of Elisa's industrial experience and Deb's academic theory and bringing it all together. And it's a wonderful book. But one of the things that as a practice it lacks is how do you know how well you have done? Now you build an ontology. Did you do better than you did last week? Did you do better than your competitor who is also trying to build an ontology? How can you tell? And they have a couple of metrics in there, but they're not very easy to measure quantitatively. Well now we actually have one. Drop your ontology into a chat with your data system, do the mappings and measure its performance. We now have a way to say this ontology is better than that one. And now my design process can take that into account. Now as a community, we aren't there yet. But that's one of the implications of this work that I'm really excited about, that this could really turn ontology engineering from this methodologic, Elisa and Deb turned it from a art into a methodological craft. Can we now turn it into a science? And I think we can. And I'm really excited about that.
Tim Gasper [00:43:45] Yeah. No, this is exciting. We hit it a little bit in our previous question exchange here around what's obvious about the research that's been done here. Dan Everett said, " Intuitively it seems obvious, but it's always good to have empirical evidence in terms of the approach here." Graham Barford mentioned, " Isn't that to be expected? Surely a SQL database does not name the relationships between tables, whereas knowledge graph does." So the-
Dean Alaman [00:44:13] Yes, we talked it about a few minutes ago. Yeah.
Tim Gasper [00:44:16] Yeah. So Dean, what are your final thoughts on this topic?
Dean Alaman [00:44:19] Well, so the thing about the obviousness of science, I'm reminded of a mathematics professor that I had when I was an undergraduate. He would have us go up to the chalkboard and prove something and at some step he'd stop us and say, " Why is that true?" And the student would say, " Well, it's obvious." Now rather than challenging whether it was obvious, he would just say, " Yes, it is obvious, but is it true?" And this seems like he's just being a word I shouldn't say on a podcast. And that's how I felt when I was up at the chalkboard. It's like, " You." But then looking back decades later, it's like, " Boy did he do me a favor." He taught me how to introspect and challenge things that I just naturally know. And you actually see this in science all the time. I don't know how many times the social networking, " Gee, there's a study that says that chiropractic actually works." Well, I knew that. Actually talked to a chiropractor who's thrilled to see that what they already knew, what all their patients know has actually got some scientific backing. So just because something is obvious doesn't mean you don't have to prove it. And in fact, as my professor taught me, sometimes that's exactly when you do have to prove it. And I think Dan gets this, he says, " It's always good to have empirical evidence." Actually, I'm going to ump that up. You need empirical evidence because just because it's obvious it might not be true.
Tim Gasper [00:45:44] Yep. No, I love that. I just think about the famous experiment dropping two objects. Does gravity accelerate them at the same rate? Probably people were pretty, obviously sure it was true, but it's a good thing we confirmed and we build upon that. Hey, let's bring this to a wrap here. Closing messages and final thoughts for our audience. Dean, why don't you kick us off?
Dean Alaman [00:46:09] Okay, so the thing that I really want to make sure that if you've been just listing that half an ear until now, that this isn't our state- of- the- art best we can do. It's an experiment, a controlled experiment where we are controlling out a whole bunch of factors. And in particular things like prompt engineering, multi- shot, RAG, customly trained LLMs, all the things that Eunice talked about at the beginning. We've controlled those out and we are saying how much bang for the buck do we get from knowledge? That's the question we're answering. But this framework can answer that same question for anything else that you think is valuable. And that is what I think is important about this, that we're starting a scientific community that can measure all those things and help us understand how knowledge and LLMs work together.
Tim Gasper [00:46:57] Well said, Dean. Juan, what about you? Take us home.
Juan Sequeda [00:47:01] So I think generative AI has been showing already as a fact the level of productivity that the world is receiving now. So if you want to have that productivity in your enterprise, you need large language models, generative AI, and you need knowledge graphs. Knowledge graphs is a requirement for generative AI and it's all about productivity and position, I think. Invest in knowledge graphs, otherwise you're going to go fail at these generative AI applications. And so invest in your metadata, invest in your semantics and your business. We need to make those first class citizens. That's what I'm asking.
Tim Gasper [00:47:36] It seems like a stack is forming here. LLM plus where appropriate vector database, plus knowledge graph, plus your catalog.
Juan Sequeda [00:47:48] And then you have all your data or your sources. There we go.
Tim Gasper [00:47:49] There's your data. Juan, Dean, I think this was hugely informative. I think our listeners are very eager to see where this goes next. If folks want to learn more about the research, understand the metrics more, maybe consume the actual research paper itself, where should they go?
Juan Sequeda [00:48:09] We just started our blog on Medium, so you can just go to, I think it's Medium/ data. world, write out data. world.
Tim Gasper [00:48:19] Data. world.
Juan Sequeda [00:48:20] And then you can find it or just follow us on LinkedIn and we'll always be sharing this on Twitter or X whatever. Yeah, follow us there and we're going to have a blog series about all this benchmark and then the next one we're doing in the AI lab.
Tim Gasper [00:48:33] Awesome. Thank you Juan. Thank you Dean. Listeners, hope you have a great Thanksgiving and stay tuned next week to another episode of Catalog& Cocktails.
Dean Alaman [00:48:42] All right, thank you, Tim.
Juan Sequeda [00:48:43] Bye everybody.
Dean Alaman [00:48:43] Bye.