RAG is all you need - Part 1 — Jeremiah Chienda

This is the first part of a series of articles on RAG. In this series, we’ll explore the RAG framework and why I believe its all you need for LLM-based question answering systems in 95% of use cases. Feel free to skip over to other parts in the series for more techical breakdowns

The other day I asked ChatGPT give me a breakdown of my Family Tree. To be fair, it actually gave me a helpful response:

To give you a breakdown of your family tree, I’ll need to gather some information from you. Here’s a general structure of what’s typically included in a family tree: …

It’s very clear from this response that ChatGPT doesn’t have access to my personal information - or does it 🤔? But what if I wanted to ask about something more sensitive, like my bank statements or passport records? How would ChatGPT respond then?

These are the type of use cases that most businesses will be interested in. They want to use the power of AI, or more precisely - large language models (LLMs) - to help them with natural language tasks that involve their private data. These tasks will typically include:

Answering questions about private data: For example, “How long does it take for my driver’s license to expire in Malawi?”
Generating text based on private data: For example, “Write a summary of my last 3 bank statements.”
Searching for information in private data: For example, “Which Law Case in my records mentions ‘defamation’? between 2010 and 2015.”
Summarizing private data: For example, “Give me a summary of my cabinet meeting notes from last year.”

Unlocking these use cases is highly valuable for businesses. It allows top executives to get insights from their data faster, and it enables customer service teams to provide better, more consistent and more available support. However, there’s a big problem: LLMs like ChatGPT typically can’t help you with this specific data.

At best, they’ll admit ignorance; at worst, they’ll make something up that will not be grounded by truth.

The African Politician

This is because at their core, LLMs are not thoughful, planning machines (although you can program them to be). They are naturally optimized to predict the next word in a sentence, and the next word after that. And not to understand the world or to reason about it. I liken them sometimes to African politicians who have mastered the art of saying a lot without saying anything at all!

For real economic transformation the state needs to control & intervene in strategic sectors of economy
— Julius Sello Malema (@Julius_S_Malema) November 22, 2010

Let’s look at the above tweet for instance. At first glance, it seems like the politician has said something meaningful. But when you look closer, you realize that they’ve said nothing at all.

The political brain has for many years parameterized patterns of speech concerning politics and economics, and they’ve learned to generate text that “makes sense” in these contexts. But usually, their responses are not grounded in hard facts or deep understanding of techincalities of the subject matter.

LLMs are similar. They’ve been trained on a lot of text data. In fact the commercially successfully ones have been trained on the entirety of publicaly available text data on the web. The architecture of these models allows them to build complex patterns that allow them to generate text that “makes sense” in a wide variety of contexts.

Bringing it all Together

Let’s pause and reflect on a few key words that I’ve introduced without explicitly mentioning them:

Pre-trained Models: These are models that have been trained on a large corpus of text data. Also known as Foundation Models, they provide an excellent well rounded Language Model that can be adapted to a wide variety of Natural Language tasks.
Tranformer Architecture: This is the architecture that powers most of the pre-trained models that we have today. It’s a deep learning model that uses self-attention mechanisms to learn contextual relationships between words in a sentence.
Parameters: These are the weights that the model learns during training. They are what makes the model “smart” and able to generate text that “makes sense”. The more parameters a model has, the more complex patterns it can learn.
Tokens: These are the smallest unit of text that the model can understand. They are usually words or subwords, and they are what the model uses to generate text.
Context Window: This is the number of tokens that the model can be given in form of input or question. The model uses this context window to generate the next words that fulfill the task at hand.
Hallucination: This is when the model generates text that is not grounded in factual information. It’s the model’s way of convincing you that it knows what it’s talking about, when in fact it doesn’t.

Armed with this new knowledge, you should now be able to understand when a technical article says something like:

The new Lama SuperSport model has 1.5 billion parameters, and it’s based on a variation of the Tranformer Architecture. It has a context window of 1024 tokens, and it’s been trained on a mixture of public and sports related proprietary data to reduce hallucination.

A Better Politician

We all hope for a future where our politicians are at the very least deeply knowledgeable about the subjects they are talking about. But we also hope for a future where our AI models are deeply knowledgeable about the technicalities of the subjects they talk about. Obviously we demand more than just talk, but action as well. In our analogy, Tool Calling is the LLM’s way of taking action based on the information it has. But that’s a topic for another day.

Russia’s invasion on Ukraine has had serious economic consequences:

🔹The Ruble sank to record lows
🔸Russian stocks collapsed by 45%

🔹Markets around the world also took a hit as oil prices went up #DambisaMoyo pic.twitter.com/ELIdcTNyYF
— Dambisa Moyo (@Dambisamoyo) February 26, 2022

Doesn’t that just inspire a lot more confidence!

Obviously this is not a political article, far from it. I just felt the analogy would bring home some of the more technical concepts that I’ve introduced in this article. It’s clear that to achieve this level of accuracy with answering domain specific questions, we need to provide the model with domain specific data. And there are two popular ways to do this: Fine-tuning and Retrieval Augmented Generation.

Comparing the Two Approaches

In the early days of LLMs, the industry seemed to converge on the idea that fine-tuning was the best way to adapt these models to new tasks. Fine-tuning involves taking a pre-trained model and training it on a smaller dataset that is specific to the task at hand. This allows the model to learn the patterns in the new data and adapt to the new task.

It is like taking a politician who has been trained in general politics and economics, and then giving them a crash course in a specific area like health or education. They will be able to generate text that is more accurate and relevant to the new domain.

However you can image that this approach has its limitations. For one, fine-tuning requires a lot of data. If you don’t have enough data, the model will not be able to learn the patterns in the new domain. This is like giving a politician a crash course in a new domain, but only giving them a few hours to learn it. They will not be able to learn the patterns in the new domain and will not be able to generate very meaningul responses next time they are in press conference. In many ways, this can even be worse than giving a generic response, because we will be giving the illusion of knowledge where there is none.

Secondly, the time and expertise required to fine-tune a model can be prohibitive. It requires a lot of computational resources and expertise to fine-tune a model. This is like having to hire a team of experts to train a politician in a new domain. It’s not something that can be done quickly or easily. Besides, I don’t think our politicians would be very happy about having to go through a crash course every time they need to answer questions from a new audience.

Enter Retrieval Augmented Generation

Over time, it became clear that Retrieval was a more efficient way to adapt LLMs to new knowledge. Retrieval Augmented Generation is like giving the LLM an external source of information that it can pull from everytime it needs to generate a response. This external source of information can be a database, a knowledge graph, but more often that not involves documents stored in a search index that can be queried for relevance and fed into the model’s context window for generation.

This is like giving our politician a team of experts that can communicate to him through an earpiece everytime he needs to answer a question. The politician can then use the information provided by the experts to generate a more accurate and relevant response in near real-time. Smart right?

Not only is this approach less computationally expensive, but it also allows the model to generate more accurate and relevant responses. This is because the model can pull from a large corpus of information that is specific to the task at hand. This is like giving the politician access to a team of experts that can provide him with the most up-to-date and relevant information on the subject.

Conclusion

In the next part of this series, we will explore how to use Retrieval Augmented Generation to build a Question Answering System. We will also explore the most common RAG technique: Vector Search. This is a technique that uses a vector representation of the context and the question to retrieve the most relevant documents from a search index.

Reach out to me on X if you have any questions or comments. I would love to hear from you about this article or any other topic you would like me to write about.

Hi 👋 I'm Jeremiah. I'm Mobile and #AI 🪄 Lead at @IremboGov by ☀️ and create https://t.co/sZ4UrAssGl by 🌙

I love to #buildinpublic and share my failers and occasional wins on https://t.co/nAroIneVbE. Let’s #connect if interested in #softwareengineering #blockchain #startups
— Chienda (@liwucodes) January 13, 2024

Stay up to date with my Posts

Get notified whenever I publish a new article concerning the latest in Software Engineering, a Youtube Video or just some thoughts about politics or Faith!

You may also be interested in

Jan 22, 2025

Back to Articles

RAG is all you need - Part 1

The African Politician

Bringing it all Together

A Better Politician

Comparing the Two Approaches

Enter Retrieval Augmented Generation

Conclusion

You may also be interested in

Practical Applications of AI Agents for African Businesses - A Real-World Perspective

Is AI really the new electricity?

How I Accidentally became an AI Engineer

Back to Articles RAG is all you need - Part 1

The African Politician

Bringing it all Together

A Better Politician

Comparing the Two Approaches

Enter Retrieval Augmented Generation

Conclusion

You may also be interested in

Practical Applications of AI Agents for African Businesses - A Real-World Perspective

Is AI really the new electricity?

How I Accidentally became an AI Engineer

Back to Articles

RAG is all you need - Part 1