How to Develop Apps Using LangChain and LLMs
LangChain is the glue that connects LLMs to your data. We explain Chains, Prompts, and Agents, and how to build your first app.
Abstract AlgorithmsTLDR: LangChain is a framework that simplifies building LLM applications. It provides abstractions for Chains (linking steps), Memory (remembering chat history), and Agents (using tools). It turns raw API calls into composable building blocks.
๐ Lego Bricks for LLM Apps
Building with the raw OpenAI API means writing the same boilerplate endlessly: formatting prompts, managing conversation history, parsing outputs, calling tools when needed.
LangChain is the Lego set โ pre-assembled pieces (prompt templates, memory stores, output parsers, tool wrappers) that snap together so you can focus on logic rather than plumbing.
| Raw API | LangChain |
| Manual string formatting | ChatPromptTemplate |
| Manual history appending | ConversationBufferMemory |
| Manual tool calling logic | AgentExecutor |
| Manual output parsing | StrOutputParser, JsonOutputParser |
๐ข The Three Core Abstractions
A. Chains โ Linking Steps
A Chain connects: User Input โ Prompt Template โ LLM โ Output Parser.
The | operator in LCEL (LangChain Expression Language) pipes the output of one step into the next:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to French: {text}")
chain = prompt | model | StrOutputParser()
result = chain.invoke({"text": "Hello, how are you?"})
# "Bonjour, comment allez-vous ?"
Chains are composable โ the output of chain can be piped into another chain.
B. Memory โ State Across Turns
LLMs are stateless: each API call starts fresh. LangChain's Memory objects inject conversation history into the next prompt automatically.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=model, memory=memory)
conversation.predict(input="My name is Alice.")
conversation.predict(input="What is my name?")
# "Your name is Alice."
| Memory Type | Keeps | Best For |
ConversationBufferMemory | Full history | Short sessions |
ConversationSummaryMemory | LLM-generated summary | Long sessions |
ConversationBufferWindowMemory | Last N turns | Chatbots with context limit |
C. Agents โ LLMs That Use Tools
An Agent is an LLM that can decide which tools to call based on the user's question.
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
tools = [WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"input": "What is the boiling point of mercury?"})
# Agent calls Wikipedia โ reads result โ returns answer
The Agent loop:
flowchart TD
Q["User Question"] --> LLM["LLM: Choose Action"]
LLM -->|calls tool| Tool["Tool (Wikipedia, Calculator, DB)"]
Tool --> Observation["Observation (result)"]
Observation --> LLM
LLM -->|has enough info| Answer["Final Answer"]
โ๏ธ Building a RAG Pipeline with LangChain
Retrieval-Augmented Generation (RAG) is the most common real-world LangChain pattern: load documents โ embed them โ retrieve relevant chunks โ answer with context.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Load and split documents
loader = TextLoader("my_docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
# 2. Embed and store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
# 3. Build the QA chain
qa = RetrievalQA.from_chain_type(
llm=model,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4})
)
qa.invoke("What is the refund policy?")
flowchart LR
Q["User Question"]
Embed["Embed Question"]
VDB["Vector Store\n(Chroma/FAISS)"]
Chunks["Top-K Chunks"]
LLM["LLM + Context"]
A["Answer"]
Q --> Embed --> VDB --> Chunks --> LLM --> A
๐ง LangSmith: Observability for LLM Chains
In production, you need to debug why a chain produced a wrong answer. LangSmith (LangChain's tracing backend) records every step:
- Which prompt was sent.
- What the LLM returned.
- Which tool was called and with what arguments.
- Total latency and token cost per step.
Enable tracing:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_key"
All chain invocations are now automatically traced.
โ๏ธ LangChain Trade-offs
| Benefit | Risk |
| Rapid prototyping with composable building blocks | Adds abstraction layers that can obscure errors |
| Built-in integrations (100+ LLMs, vector stores, tools) | Version churn โ API changes frequently |
| Memory management out of the box | Token cost grows if memory strategy is not tuned |
| Tracing via LangSmith | Production overhead if not carefully sampled |
When to skip LangChain: If your use case is a single LLM call with a fixed prompt, the raw API (OpenAI SDK) is simpler and more debuggable. LangChain pays off when you have multi-step chains, conditional tool use, or complex memory strategies.
๐ฏ What to Study Next
- Guide to Using RAG with LangChain and ChromaDB/FAISS
- AI Agents Explained: When LLMs Start Using Tools
- How GPT/LLMs Work
๐ Summary
- Chains (LCEL): Compose prompt โ LLM โ parser pipelines with the
|operator. - Memory: Inject conversation history automatically. Choose the right memory type for session length.
- Agents: LLMs that call tools in a loop until they have enough information to answer.
- RAG: Load โ chunk โ embed โ retrieve โ answer. The most common production pattern.
- LangSmith: Trace every chain step for debugging and cost analysis.
๐ Practice Quiz
What does the
|operator do in LangChain Expression Language (LCEL)?- A) It's a bitwise OR operation.
- B) It chains the output of one step to the input of the next in a pipeline.
- C) It runs two chains in parallel.
Answer: B
An LLM chatbot loses context after a few turns. Which LangChain component solves this?
- A) OutputParser.
- B) Memory (e.g., ConversationBufferMemory).
- C) AgentExecutor.
Answer: B
When should you prefer the raw OpenAI SDK over LangChain?
- A) Always โ LangChain is too slow.
- B) For simple single-call applications where the abstraction adds more complexity than it saves.
- C) Only when deploying to AWS.
Answer: B

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
