Multistep AI Agents: The Power of Planning
Simple AI agents react one step at a time. Multistep agents are different: they create a full pla...
Abstract Algorithms
TLDR: A simple ReAct agent reacts one tool call at a time. A multistep agent plans a complete task decomposition upfront, then executes each step sequentially โ handling complex goals that require 5-10 interdependent actions without re-prompting the LLM for each step.
๐ Line Cook vs. Head Chef
A line cook (simple ReAct agent): receives one ticket, cooks one dish, hands it over. Then the next ticket.
A head chef (multistep agent): receives the full dinner party menu, plans the entire 5-course sequence, coordinates prep timing for all dishes, anticipates which items can be done in parallel, and manages the full execution before the first guest is seated.
The difference: planning before acting. Complex goals require plans, not just reactions.
๐ข Simple ReAct vs. Plan-and-Execute: Core Difference
| Dimension | ReAct (Single-Step Loop) | Plan-and-Execute (Multistep) |
| Planning | None โ LLM decides next action after each observation | LLM creates a full plan upfront (JSON array of steps) |
| LLM calls | One per action (tight feedback loop) | One for planning; one per step for execution |
| Best for | Short, open-ended tasks with unknown required steps | Long tasks with a knowable step structure |
| Failure handling | Adapts after each observation | Re-plan on step failure |
| Token cost | Lower per step | Higher plan call; lower execution calls |
โ๏ธ The Plan-and-Execute Architecture
Goal: "Research the top 3 AI papers from last month, summarize each, and draft a blog post."
Phase 1 โ Plan Call (one LLM call):
[
{ "step": 1, "action": "search", "args": ["top AI papers July 2025"] },
{ "step": 2, "action": "fetch_abstract", "args": ["paper_id_1"] },
{ "step": 3, "action": "summarize", "args": ["abstract_1"] },
{ "step": 4, "action": "fetch_abstract", "args": ["paper_id_2"] },
{ "step": 5, "action": "summarize", "args": ["abstract_2"] },
{ "step": 6, "action": "fetch_abstract", "args": ["paper_id_3"] },
{ "step": 7, "action": "summarize", "args": ["abstract_3"] },
{ "step": 8, "action": "write_post", "args": ["[summary_1, summary_2, summary_3]"] }
]
Phase 2 โ Execution Loop (LLM only called when tool output needs reasoning):
flowchart TD
Goal["Complex Goal"] --> Planner["LLM Planner\n(one call โ JSON plan)"]
Planner --> Loop["Executor Loop"]
Loop --> Step["Execute Next Step\n(tool call or sub-LLM call)"]
Step --> Check{"Last step?"}
Check -->|No| Loop
Check -->|Yes| Result["Final Result"]
Step -->|Failure| Replan["Re-plan remaining steps"]
Replan --> Loop
๐ง LangChain Plan-and-Execute Agent
from langchain_experimental.plan_and_execute import (
PlanAndExecute,
load_agent_executor,
load_chat_planner,
)
from langchain_openai import ChatOpenAI
from langchain_community.tools import WikipediaQueryRun, DuckDuckGoSearchRun
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [WikipediaQueryRun(), DuckDuckGoSearchRun()]
planner = load_chat_planner(llm)
executor = load_agent_executor(llm, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor)
agent.invoke({
"input": "Research the top 3 AI papers from last month, summarize each, and draft a blog post."
})
The planner produces a step list; the executor runs each step with access to tools.
โ๏ธ When to Use Multistep Agents vs Simple Agents
| Use Case | Simple ReAct | Multistep Plan-Execute |
| Q&A with a single tool lookup | โ | Overkill |
| Writing a report with 8 research steps | โ | โ |
| Interactive conversation with user feedback | โ | Awkward |
| Automated pipeline with known step structure | โ | โ |
| Debugging code with back-and-forth tool calls | โ | โ |
Critical failure modes for multistep agents:
- Stale plan: If step 3 fails, steps 4-8 may be based on incorrect assumptions. Solution: re-plan from the failure point.
- Context window overflow: 10-step plans with long intermediate outputs can exceed context length. Solution: summarize intermediate results before passing to the next step.
- Hallucinated tool calls: LLM may plan to call a tool that doesn't exist. Solution: validate the plan against available tools before execution begins.
๐ Summary
- Multistep agents plan the full task structure upfront, then execute step by step with minimal LLM calls during execution.
- Plan-and-Execute = one Planner LLM call โ JSON step list โ Executor loop using tools.
- Best for tasks with a knowable structure (reports, research pipelines, automated workflows).
- Failure handling: re-plan from failed step, not from scratch.
- LangChain's
PlanAndExecutewraps this pattern in a few lines of Python.
๐ Practice Quiz
What is the key structural difference between a ReAct agent and a Plan-and-Execute agent?
- A) ReAct uses more tools.
- B) ReAct decides the next action after each observation; Plan-and-Execute creates a full step list upfront before any tool is called.
- C) Plan-and-Execute cannot use external tools.
Answer: B
Step 4 of a 10-step multistep agent plan fails. What is the correct recovery approach?
- A) Restart the entire plan from step 1.
- B) Re-plan the remaining steps (4-10) from the current state, preserving completed results from steps 1-3.
- C) Skip step 4 and continue with step 5.
Answer: B
A multistep agent plan refers to a tool called
get_stock_price(), but no such tool is registered. When is the best time to detect this?- A) At execution time, when the executor tries to call the tool.
- B) After plan generation, by validating all planned tool names against the registered tools list before execution begins.
- C) By asking the user to confirm each step manually.
Answer: B

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
