LLMs are already being used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. But what’s next for LLMs? And how can knowledge graphs be used to take them even further?
“Oh, crap. AI is going to take over the world and turn us all into paper clips because of emerging properties. We’d better pause training of larger models in particular for a while.” And you know, that it seems like there’s a dichotomy where there are some others who have come out and said, “No, that’s silly for both the practical reasons of how you’re gonna do that as well as—are we actually on the cusp of any sort of paper-clip-turning monstrosity?
Some would say that it’s a trade-off of short-term versus long-term harms. In some respects, it’s not training that’s the problem, it’s the short-term harms, data collection, and data gathering, and it’s all about the data anyway. If we halt training for six months, that only delays the inevitable.
We’ve been talking a lot about LLMs lately. How accurate is this thing? Can we trust it? LLMs are generally accurate, but they can also “hallucinate”, generating seemingly true statements that turn out to be false (ChatGPT confidently told me that the fastest marine mammal was the peregrine falcon, which is neither marine nor a mammal). It has no way of knowing whether it’s telling the truth, it just spits out plausible words—and if the thing were actually sentient, it would hope that you buy it. In both the literal and the figurative sense. In a way, they’re still word calculators.
If large language models are effectively word calculators that kind of suggest that they can generate some nice compelling text and sometimes that text can be useful. But as we all know from using them, you can end up having the thing hallucinate whenever it gets to a fact it doesn’t know and, and they’re bullshit artists.
It doesn’t really care whether or not, or have any means of knowing whether it’s telling the truth, it simply spits out words, and those words seem plausible, and if the thing were actually sentient and could hope it would hope that you buy it. In both the literal and the figurative sense.
Jaxon has been putting guard rails on LLMs by integrating them with knowledge graphs that contain data specific to the company/agency and use case. These knowledge graphs serve as a source of truth to keep the LLMs in line.
Pretty much any problem can be modeled as a knowledge graph. You’re focusing on foreign keys and the relationships between tables, rather than the rows in your tables or queries. You can frame a lot of problems as a graph problem, with graph reversals as a pretty straightforward solution. After that, the graph reversal can prove exactly what the decision-making process was either directly or with a well-codified chain of probabilities.
Yeah, the language models themselves contain a whole lot of information. You know, they’re trained on a ton of information on the web. And just an interesting observation that these folks made was that when you get into that whole you know, into that model and try to figure out: how is it doing what it’s doing? How is it actually doing that word calculator work across it?
It’s actually kind of creating triplets on the fly, as it were, inside the “black box voodoo” inside the LLMs that nobody really, in gory detail, understands how they work yet, hence the AI halt letter. They’re actually starting to set up relation triples, which is the form of a graph.
So it’s kind of an open graph within the large language model. It’s kind of an interesting way to think about it. You know, are you framing when you’re prompting, are you trying to basically frame a specific graph? Make that graph a little bit less open? It’s only for the purpose of getting at a particular bit of the information that was inside that, that large language model.
Let’s go down a level and start talking tactically about how we productively use these things together. LLMs have been trained on the world’s publicly available data, which is, conservatively speaking, a lot. You can fine-tune the model on a specific task, which is often tantamount to retraining (a difficult and expensive process). That has its pros and cons; it still can be heavyweight and requires some data, but it also tends to veer away from the generality of the LLMs and narrow them in on one task.
But if you’re trying to specialize them for a general task—one, there’s the injection of new data at the prompt level and after, which leads us to instruction prompting. You can take documents and create embeddings. You can effectively index all the documents by encoding a high dimensional vector, which you can distill down to this embedding point. You can inject the most relevant documentation and verbiage right into the prompt itself, extracting that reason and giving it a place to start.
Two, you can also use instruction tuning. For example, if you query ChatGPT, it first tries to make sense of the query content—and if it doesn’t know something, maybe a knowledge graph, supplemental document, or Google search can help. But you can get it to find that information, get its response, inject it into your original prompt, and then re-prompt itself. It’s added that meaning to its own context, so it can check if it can actually answer your question.
There are a few cute demo apps that are coming out, and there’s this sort of tendency naively to step into machine learning driven applications, whether for the consumer or even business apps, with this notion that I’m just gonna train my one model and all my dreams will come true. But in fact to the point that these models aren’t magic, and in some aspects, they aren’t even that smart. You know, there, this is—nothing has changed in the sense that you built system architecture when you had complex problems, you have a system of different software modules. And that hasn’t changed, it’s just that now some of those software model modules are based on machine learning. So they’re data driven, they are probabilistic.
And you know, one of the challenges is as you design these things, there’s always more than one way to skin the cat. But when you have a system, you have to design the system, you can’t just zero in on one component. We all have the dream as software engineers of complete and true decoupling. And that, unfortunately, isn’t always the case, and especially when your systems are probabilistic… changing one thing, changes everything, and it gets to be a hairy design problem.
So how does this link to Jaxon? The wave of new demo apps often bring along the idea that one model can do it all. Unfortunately, nothing has changed on the system architecture front: you still have to contend with networks of different software modules, even if some of those software model modules are based on machine learning. We all have the dream as software engineers of complete and true decoupling, but that isn’t always feasible, especially when your systems are probabilistic… changing one thing changes everything, and it gets complicated.
That’s part of our inspiration for Jaxon 2.0—adapting our focus on rapidly developing and prototyping systems to the new LLM explosion. We believe it’s important to get things right before you invest at full scale, getting your models hyper-parameterized and everything optimized.
There are a lot of code generators appearing on the market, which are like having a junior engineer acting as a second pair of eyes and spotting problems. We want to complement that with more knowledge—what if that junior engineer was a senior architect with a PhD in machine learning? They could help you with high-level design problems, options at the system level, experiments that can start narrowing uncertainty, and tradeoffs.
The UI should be familiar to Jupyter Notebook users. We have a new cell type, which is an interaction with an LLM, and we’re training an assortment of plugin tools and crafted prompts to take advantage of them. This incorporates the data we’ve accumulated and the reasoning about system architectures, which will transform this LLM into an interface to drive reasoning and interaction against a proposed system architecture, and help you develop it, find soft spots, and identify potential experiments.
I can generate the code that will explore those different options, accounting for any restrictions, and eventually execute the task for me. You also have a log to track changes and the reasoning behind the design process, which you can learn from and reference in future projects. Don’t reinvent the wheel.