All Posts

RAG Explained: How to Give Your LLM a Brain Upgrade

Abstract AlgorithmsAbstract Algorithms
··4 min read

TL;DR

TLDR: RAG (Retrieval-Augmented Generation) stops LLMs from making stuff up. It works by first searching a private database for facts (Retrieval) and then pasting those facts into the prompt for the LLM to use (Augmented Generation). It's like giving ...

Cover Image for RAG Explained: How to Give Your LLM a Brain Upgrade

TLDR: RAG (Retrieval-Augmented Generation) stops LLMs from making stuff up. It works by first searching a private database for facts (Retrieval) and then pasting those facts into the prompt for the LLM to use (Augmented Generation). It's like giving your AI an open-book exam.


1. The Problem: Why LLMs Need Help

Standard LLMs have two major flaws:

  1. Hallucinations: They confidently invent facts because they are just predicting the next likely word, not querying a knowledge base.
  2. Knowledge Cutoff: Their knowledge is frozen at the time of training. They don't know about recent events or your company's private data.

Example (Without RAG):

You: "What were our company's Q3 earnings?" LLM: "As a large language model, I don't have access to your private financial data..." (Useless).


2. The Solution: Retrieval-Augmented Generation (RAG)

The "No-Jargon" Explanation: Imagine an Open-Book Exam.

  • Standard LLM: A student taking a closed-book exam, relying only on what they memorized months ago.
  • RAG LLM: A student who can bring the textbook to the exam. Before answering a question, they look up the relevant page.

RAG connects the LLM to a live, external knowledge source.

The RAG Workflow

It's a two-step process: Retrieval, then Generation.

Step 1: Retrieval (Find the Textbook Page)

  1. User asks a question: "How do I reset my password?"
  2. Embed the question: Turn the question into a vector (a list of numbers) that represents its meaning.
  3. Vector Search: Search your private database (e.g., company wiki, PDFs) to find text chunks with similar vectors.
  4. Retrieve: Pull the top 3-5 most relevant chunks.

Step 2: Augmentation & Generation (Answer the Question)

  1. Build a new prompt: Combine the original question with the facts you just found.

    Context: "To reset your password, go to Settings > Security > Reset Password."
    
    Question: "How do I reset my password?"
    
    Answer based only on the context above:
    
  2. Send to LLM: The LLM uses the provided context to generate a factual, grounded answer.

3. Deep Dive: How Vector Search Works

How do we "search for meaning"? We use Vector Embeddings and Cosine Similarity.

The Concept

  1. Indexing: We use an embedding model (like text-embedding-ada-002) to convert every paragraph of our documents into a high-dimensional vector.
  2. Storage: We store these vectors in a specialized Vector Database (e.g., Pinecone, Chroma, Weaviate).
  3. Querying: When a user asks a question, we embed their query into the same vector space.
  4. Similarity Search: We find the vectors in the database that are "closest" to the query vector.

The Math: Cosine Similarity

Instead of measuring distance, we measure the angle between vectors. A smaller angle means a more similar meaning.

$$ \text{Cosine Similarity} = \frac{A \cdot B}{\|A\| \|B\|} $$

  • Result = 1: Vectors point in the same direction (Identical meaning).
  • Result = 0: Vectors are perpendicular (Unrelated).
  • Result = -1: Vectors point in opposite directions (Opposite meaning).

Toy Example:

TextVector (Simplified 2D)
"How to reset password" (Query)[0.9, 0.1]
"Password reset guide" (Doc A)[0.8, 0.2]
"Company holiday schedule" (Doc B)[-0.1, 0.9]

The angle between the Query and Doc A is very small (high similarity). The angle between the Query and Doc B is large (low similarity). The system retrieves Doc A.


4. RAG vs. Fine-Tuning: What's the Difference?

FeatureRAGFine-Tuning
GoalInjecting knowledgeTeaching a skill/style
HowAdds a databaseUpdates model weights
DataEasy to add/deleteRequires retraining
CostCheap (API calls)Expensive (GPU time)
Use Case"Answer questions from this PDF""Act like a sarcastic pirate"

Summary & Key Takeaways

  • RAG = Retrieval (Search) + Generation (Answer).
  • It solves Hallucinations and Knowledge Cutoff by grounding the LLM in facts.
  • The core technology is Vector Search (using Embeddings and Cosine Similarity).
  • Use RAG to inject knowledge. Use Fine-Tuning to change behavior.

Practice Quiz: Test Your Knowledge

  1. Scenario: You want to build a chatbot that can answer questions about your company's internal, private documents. Which is the best approach?

    • A) Hope the LLM already knows the data.
    • B) Use RAG to connect the LLM to a vector database of your documents.
    • C) Ask the user to copy-paste the documents into the chat.
  2. Scenario: What is the primary mathematical operation used in the "Retrieval" step of RAG to find relevant documents?

    • A) Matrix Multiplication
    • B) Cosine Similarity
    • C) Standard Deviation
  3. Scenario: You want to make an LLM write in the style of Shakespeare. What is the best approach?

    • A) RAG with a database of Shakespeare's plays.
    • B) Fine-Tuning the model on a dataset of Shakespeare's works.
    • C) Prompting "Please act like Shakespeare."

(Answers: 1-B, 2-B, 3-B)

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms