Stories

Q&A: What the heck is RAG?

This article is a part of Ask Kat, a Q&A series in our AI newsletter, Deeper Learning. You ask questions about AI, then we get the answers and explain them in simple, non-jargony terms.

Aaron O'Leary
Aaron O'Leary
May 7th, 2024
In today’s edition, we’re touching on RAG, a buzzword you either have seen in AI circles or will notice after you read this.
Last week, we explained why AI models hallucinate, and this week we’re covering a related question submitted by our own CEO: “What’s the deal with RAG?”
You might have seen the term floating around the AI industry lately. It stands for Retrieval Augmented Generation and it’s a process that AI engineers can use to optimize the output of large language models. As you now know, hallucinations ultimately result from shortfalls in the training of AI — i.e. despite all the data that models are trained on, it’s hard to pack everything in. So what if you didn’t need to pack everything in? What if, instead, a model could retrieve information to enhance how accurate its responses are?
With RAG, AI engineers can introduce external data into the process in various formats (think records in databases, document files in repositories, or APIs). We’ll skip over the nitty-gritty of how the external data is delivered for now, but the main point is that it’s converted into a library that the models can “understand” (in short, they have numerical representations that help them determine what’s relevant). So when a user makes a search, the model is no longer just stuck with whatever input the user provided. The system can also reference that library of information and integrate that info with a person’s initial query so that the LLM can deliver contextually appropriate responses. 
For example, let’s say you’re asking an AI bot “How much PTO do I have left?” With RAG, the system would do some calculations on documents it has access to, retrieve your company’s policy docs and any requests you’ve made for time off this year. Then it would augment your original query and deliver the query to the LLM for an answer.
As a former chef, I like to think of RAG as a cook who is making a recipe but goes to check his pantry and doesn’t have all the right ingredients. Instead of just throwing in oil for butter, he checks his phone and determines that vegetable oil plus ½ teaspoon of salt is a better substitute for butter.
Does RAG solve all the problems? No, probably not, as of yet. Many makers report having great results with RAG and the technique may be behind the tools you’re using now, but others have experienced limitations. In many cases, limitations are still a result of context, or lack thereof. Remember: Humans have and create a TON of context. Imagine a complex legal doc that references bits of information throughout pages and pages of a document, each of its parts relating to others but not necessarily in order. That would be hard enough for a human to parse and understand, let alone a model. Check out this article from Pyq AI CEO Aman Raghuvanshi called “LLMs and the Harry Potter Problem” for deeper reading on this.
The rise of RAG. Unsurprisingly, tech startups and companies have been weaving RAG tools into their products or creating out-of-the-box RAG solutions as more AI engineers want to use it. For example, check out the launches SciPhi, Linq, Verta Super RAG.
Alternatives to RAG. Ya, you just learned about RAG, but now people are talking about a “RAG Killer?” Sigh.
The tldr here is that new LLMs, like Meta’s Llama 3, have "long context windows" and these are meant to help them “recall” more stuff. A context window is how many tokens (words, bits of words) an LLM can consider when generating a response, so long context windows should equate to better answers. Do you want to learn more about context windows in the future newsletter? Or the landscape of RAG products? Let us know in the comments!
This article first appeared in our AI newsletter, Deeper Learning. Subscribe here, and let us know what questions you have about AI in the comments!