The future of knowledge graphs and large language models (LLMs) like ChatGPT

LLMs are already being used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. But what’s next for LLMs? And how can knowledge graphs be used to take them even further?

Some would say that it’s a trade-off of short-term versus long-term harms. In some respects, it’s not training that’s the problem, it’s the short-term harms, data collection, and data gathering, and it’s all about the data anyway. If we halt training for six months, that only delays the inevitable.

We’ve been talking a lot about LLMs lately. How accurate is this thing? Can we trust it? LLMs are generally accurate, but they can also “hallucinate”, generating seemingly true statements that turn out to be false (ChatGPT confidently told me that the fastest marine mammal was the peregrine falcon, which is neither marine nor a mammal). It has no way of knowing whether it’s telling the truth, it just spits out plausible words—and if the thing were actually sentient, it would hope that you buy it. In both the literal and the figurative sense. In a way, they’re still word calculators.

Jaxon has been putting guard rails on LLMs by integrating them with knowledge graphs that contain data specific to the company/agency and use case. These knowledge graphs serve as a source of truth to keep the LLMs in line.

Pretty much any problem can be modeled as a knowledge graph. You’re focusing on foreign keys and the relationships between tables, rather than the rows in your tables or queries. You can frame a lot of problems as a graph problem, with graph reversals as a pretty straightforward solution. After that, the graph reversal can prove exactly what the decision-making process was either directly or with a well-codified chain of probabilities.

Let’s go down a level and start talking tactically about how we productively use these things together. LLMs have been trained on the world’s publicly available data, which is, conservatively speaking, a lot. You can fine-tune the model on a specific task, which is often tantamount to retraining (a difficult and expensive process). That has its pros and cons; it still can be heavyweight and requires some data, but it also tends to veer away from the generality of the LLMs and narrow them in on one task.

But if you’re trying to specialize them for a general task—one, there’s the injection of new data at the prompt level and after, which leads us to instruction prompting. You can take documents and create embeddings. You can effectively index all the documents by encoding a high dimensional vector, which you can distill down to this embedding point. You can inject the most relevant documentation and verbiage right into the prompt itself, extracting that reason and giving it a place to start.

Two, you can also use instruction tuning. For example, if you query ChatGPT, it first tries to make sense of the query content—and if it doesn’t know something, maybe a knowledge graph, supplemental document, or Google search can help. But you can get it to find that information, get its response, inject it into your original prompt, and then re-prompt itself. It’s added that meaning to its own context, so it can check if it can actually answer your question.

So how does this link to Jaxon? The wave of new demo apps often bring along the idea that one model can do it all. Unfortunately, nothing has changed on the system architecture front: you still have to contend with networks of different software modules, even if some of those software model modules are based on machine learning. We all have the dream as software engineers of complete and true decoupling, but that isn’t always feasible, especially when your systems are probabilistic… changing one thing changes everything, and it gets complicated.

That’s part of our inspiration for Jaxon 2.0—adapting our focus on rapidly developing and prototyping systems to the new LLM explosion. We believe it’s important to get things right before you invest at full scale, getting your models hyper-parameterized and everything optimized.

There are a lot of code generators appearing on the market, which are like having a junior engineer acting as a second pair of eyes and spotting problems. We want to complement that with more knowledge—what if that junior engineer was a senior architect with a PhD in machine learning? They could help you with high-level design problems, options at the system level, experiments that can start narrowing uncertainty, and tradeoffs.

The UI should be familiar to Jupyter Notebook users. We have a new cell type, which is an interaction with an LLM, and we’re training an assortment of plugin tools and crafted prompts to take advantage of them. This incorporates the data we’ve accumulated and the reasoning about system architectures, which will transform this LLM into an interface to drive reasoning and interaction against a proposed system architecture, and help you develop it, find soft spots, and identify potential experiments.

I can generate the code that will explore those different options, accounting for any restrictions, and eventually execute the task for me. You also have a log to track changes and the reasoning behind the design process, which you can learn from and reference in future projects. Don’t reinvent the wheel.

RAG is NOT Enough