LLM Terms You Should Know: A Helpful Glossary

Abstract Algorithms

·Feb 11, 2026·8 min read

TL;DR

TLDR: The world of Generative AI is full of jargon. This post is your dictionary. Whether you are a developer, a researcher, or just curious, use this guide to decode the language of Large Language Models. A Agent An AI system that uses an LLM as i...

Cover Image for LLM Terms You Should Know: A Helpful Glossary

TLDR: The world of Generative AI is full of jargon. This post is your dictionary. Whether you are a developer, a researcher, or just curious, use this guide to decode the language of Large Language Models.

A

Agent

An AI system that uses an LLM as its "brain" to perform actions. Instead of just answering a question, an Agent can browse the web, execute code, or use APIs to complete a goal.

Example: AutoGPT, BabyAGI.
Usage: "I built an agent that reads my emails and automatically adds events to my Google Calendar."

Alignment

The process of ensuring an AI's behavior matches human values and intent.

Example: Preventing the model from generating instructions on how to build a bomb.
See also: RLHF.

Attention Mechanism

The core mathematical operation in Transformers that allows the model to focus on different parts of the input sentence when generating a response.

Analogy: When reading the sentence "The bank of the river," attention helps the model understand "bank" refers to land, not money, by looking at "river."
Math: $$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $$

Auto-Regressive

A property of models (like GPT) that generate text one word at a time, using the previously generated words as input for the next one.

Example: To generate "I love AI", it generates "I", then feeds "I" back in to generate "love", then feeds "I love" back in to generate "AI".

B

Beam Search

A decoding strategy where the model explores multiple possible "paths" of future words simultaneously and keeps the most likely ones, rather than just picking the single best word at each step.

Usage: Used in translation to ensure the whole sentence makes sense grammatically, rather than just translating word-for-word.

Bias

Systematic errors or prejudices in the model's output, often inherited from the training data.

Example: If a model is trained on old books, it might assume "Doctor" is always male and "Nurse" is always female.

C

Chain-of-Thought (CoT)

A prompting technique where you ask the model to "think step-by-step" before giving the final answer. This significantly improves performance on math and logic tasks.

Example:
- Standard Prompt: "What is 23 * 4?" -> "92" (Might be wrong).
- CoT Prompt: "Think step by step." -> "20 4 is 80. 3 4 is 12. 80 + 12 is 92." (More reliable).

Context Window

The maximum amount of text (measured in tokens) the model can "remember" or process at one time.

Example: GPT-4 has a context window of up to 128k tokens (approx. 300 pages of text).
Usage: If you paste a whole book into a model with a small context window, it will "forget" the beginning of the book.

E

Embedding

A vector (list of numbers) that represents the meaning of a piece of text. Words with similar meanings have embeddings that are close together in mathematical space.

Example: $$ \vec{v}(\text{King}) - \vec{v}(\text{Man}) + \vec{v}(\text{Woman}) \approx \vec{v}(\text{Queen}) $$
Usage: Used in RAG to find relevant documents.

Emergent Properties

Capabilities that appear in large models that were not present in smaller ones, such as the ability to write code or solve riddles, despite not being explicitly trained to do so.

Example: A model trained only to predict the next word suddenly learning how to translate English to French.

F

Fine-Tuning

The process of taking a pre-trained base model and training it further on a specific dataset to specialize it for a task.

Example: Taking a generic GPT model and training it on 10,000 medical records so it becomes a "Medical Assistant."
Math: Updates weights $\theta$ to minimize loss on the specific dataset $D_{fine}$.

Few-Shot Prompting

Giving the model a few examples of what you want it to do inside the prompt.

Example:
- Prompt: "Translate to French. Hello -> Bonjour. Dog -> Chien. Cat -> ?"
- Output: "Chat"

G

Generative AI

A class of AI models that can create new content (text, images, audio) rather than just classifying existing data.

Contrast: Discriminative AI (Classifies "Is this a cat?") vs. Generative AI (Draws a cat).

Grounding

Connecting the model's generation to verifiable sources or real-world data to prevent hallucinations.

Usage: "This chatbot is grounded in the Wikipedia API, so it provides citations."

H

Hallucination

When an LLM confidently generates false or nonsensical information.

Cause: The model is predicting the most likely next word based on patterns, not checking facts against a database.
Example: Asking "Who was the President of the US in 1600?" and the model inventing a name instead of saying "The US didn't exist."

I

Inference

The process of using a trained model to generate predictions (i.e., actually using ChatGPT to chat).

Usage: "Training takes months, but inference takes milliseconds."

L

Latent Space

The abstract multi-dimensional space where the model stores its internal representation of concepts.

Analogy: Imagine a 3D map where "Happy" is close to "Joy" but far from "Sad".

LLM (Large Language Model)

A neural network with billions of parameters trained on massive amounts of text data.

Example: GPT-4, Claude 3, Llama 3.

LoRA (Low-Rank Adaptation)

A technique for fine-tuning LLMs efficiently. Instead of updating all weights, it trains small "adapter" layers, requiring much less GPU memory.

Math: $$ W{new} = W{old} + \Delta W $$ where $\Delta W$ is decomposed into two small matrices $A \times B$.

M

Model Collapse

A theoretical risk where future AI models trained on AI-generated data become progressively dumber and less diverse, as they lose touch with the original human distribution.

Analogy: Making a photocopy of a photocopy of a photocopy. Eventually, the image becomes blur.

Multimodal

A model that can understand and generate multiple types of media, such as text, images, and audio.

Example: GPT-4o can see a picture of your fridge and suggest recipes.

P

Parameter

The internal variables (weights and biases) learned by the model during training. Roughly equivalent to the "synapses" in a brain.

Scale: GPT-4 is estimated to have over 1 trillion parameters.
Math: $$ y = Wx + b $$ Here, $W$ and $b$ are parameters.

Pre-training

The first and most expensive phase of training, where the model learns language patterns from massive datasets (the internet) without specific instructions.

Goal: Learn "What is language?" and "What are facts?"

Prompt Engineering

The art of crafting inputs (prompts) to guide the model to produce the best possible output.

Example: Adding "Act as a senior Python developer" to get better code.

Q

Quantization

Reducing the precision of the model's weights (e.g., from 16-bit floats to 4-bit integers) to make the model smaller and faster with minimal loss in quality.

Usage: Running a 70B parameter model on a consumer laptop.

R

RAG (Retrieval-Augmented Generation)

A technique where the model retrieves relevant facts from an external database (like your company wiki) and uses them to answer a question. This reduces hallucinations.

Process: User Query -> Search Database -> Paste Results into Prompt -> LLM Answer.

RLHF (Reinforcement Learning from Human Feedback)

The training step used to align models like ChatGPT. Humans rate model outputs, and the model learns to maximize that reward.

Math: Uses PPO (Proximal Policy Optimization) to maximize Reward $R$.

T

Temperature

A setting that controls the "randomness" of the model's output.

Low (0.0): Deterministic, focused, repetitive. Best for coding.
High (1.0): Creative, random, prone to errors. Best for poetry.
Math: $$ P_i = \frac{\exp(z_i / T)}{\sum \exp(z_j / T)} $$

Token

The basic unit of text for an LLM. It can be a word, part of a word, or a character.

Rule of Thumb: 1,000 tokens $\approx$ 750 words.
Example: The word "Hamburger" might be 3 tokens: "Ham", "bur", "ger".

Transformer

The neural network architecture introduced by Google in 2017 ("Attention Is All You Need") that powers all modern LLMs.

Key Innovation: Parallel processing (unlike RNNs) and Attention mechanism.

Z

Zero-Shot Prompting

Asking the model to perform a task without giving it any examples.

Example: "Translate this sentence to Spanish."
Contrast: Few-Shot (giving examples).

Summary & Key Takeaways

Prompting: Zero-Shot (No examples), Few-Shot (Examples), Chain-of-Thought (Step-by-Step).
Training: Pre-training (Learn facts), Fine-Tuning (Learn tasks), RLHF (Learn safety).
Architecture: Transformer, Attention, Embeddings.
Usage: Temperature (Creativity), Context Window (Memory), RAG (Fact-checking).

Practice Quiz: Test Your Vocabulary

Scenario: You want your chatbot to answer questions using only your company's internal PDF documents, not its general training data. Which technique do you use?
- A) Fine-Tuning
- B) RAG (Retrieval-Augmented Generation)
- C) RLHF
Scenario: You are building a code generator. You want the code to be precise and syntactically correct every time. What Temperature setting should you use?
- A) High (0.9)
- B) Medium (0.5)
- C) Low (0.1)
Scenario: You ask ChatGPT to solve a complex math riddle. It gets it wrong. You then ask it to "Think step by step," and it gets it right. What is this technique called?
- A) Chain-of-Thought (CoT)
- B) Zero-Shot Prompting
- C) Hallucination

(Answers: 1-B, 2-C, 3-A)