'The Developer''s Guide: When to Use Code, ML, LLMs, or Agents'
Stop trying to solve everything with ChatGPT. We provide a decision framework for modern develope...
Abstract AlgorithmsTLDR: AI is a tool, not a religion. Use Code for deterministic logic (banking, math). Use Traditional ML for structured predictions (fraud, recommendations). Use LLMs for unstructured text (summarization, chat). Use Agents only when a task genuinely requires multi-step planning and external tool calls.
📖 One Codebase, Four Paradigms: Know Before You Reach for the LLM
The most expensive mistake in modern software is using an LLM for a problem deterministic code solves in 5 lines.
Before adding an AI component, ask two questions:
- Is the output deterministic? If yes, write code.
- Does the input have known structure? If yes, use ML.
If both answers are no and the input is natural language, then LLMs are the right tool. Agents are warranted only when the task requires multiple steps with external tool calls to complete.
🔢 Pure Code: When Determinism Is Non-Negotiable
Any operation where you can write an explicit rule belongs here.
| Use case | Code approach |
Calculate tax = subtotal * 0.08 | 1 line of arithmetic |
| Validate email format | Regex |
| Parse a known JSON schema | json.loads() |
| Sort a list by timestamp | sorted(items, key=lambda x: x.ts) |
| Route a payment to the right processor | If-else / pattern matching |
When code beats AI: Banking transactions, data migrations, format validation, mathematical computations, protocol parsing. The rule: if a junior developer could write a test that covers every case, write code.
⚙️ Traditional ML: Patterns in Structured Tabular Data
Use ML when the rule is too complex to write by hand, but the input is structured (rows and columns with known features).
flowchart LR
Features[Structured Features\nage, amount, location] --> Model[ML Model\nXGBoost / Random Forest]
Model --> Prediction[Score or Label\nfraud probability]
| Use case | Features | Model |
| Fraud detection | Amount, merchant, velocity | Gradient boosting (XGBoost) |
| Churn prediction | Login frequency, support tickets | Logistic regression |
| Product recommendations | Purchase history, ratings | Collaborative filtering / Matrix factorization |
| House price estimation | sq ft, location, year | Linear regression |
| Spam filter (classic) | Word frequencies (TF-IDF) | Naive Bayes / SVM |
ML requires: labeled training data, feature engineering, model evaluation, and a retraining pipeline. If you don't have those, use rules instead.
🧠 LLMs: When the Input Is Unstructured Text
LLMs excel at tasks where the input is free-form text and the output is also text (or a structured schema derived from text).
| Task | Why LLM | Why not code/ML |
| Summarize a 20-page PDF | Understands context and importance | Rules can't, ML needs fine-tuning |
| Classify support ticket intent | Handles natural language variation | Rules miss edge cases, ML needs labeled data |
| Generate code from a description | Trained on vast code corpus | Impossible with deterministic rules |
| Extract entities from unstructured text | Flexible to schema variation | Classic NER models need annotation per domain |
| Answer questions about a document (RAG) | Combines retrieval + reasoning | Rules don't reason; classic ML doesn't generalize here |
Cost reminder: Every LLM call costs money and adds latency. Never use an LLM for tasks that code or a simple ML model can solve.
🤖 Agents: For Multi-Step Goals That Require External Tools
Use agents when completing the task requires:
- Multiple actions (not just one generation)
- Calling external APIs or tools (not just text transformation)
- Adapting plans based on intermediate results
| Task | Agent needed? | Why |
| "Summarize this document" | No | Single LLM call |
| "Book the cheapest flight to Paris next Tuesday" | Yes | Needs search API, calendar check, payment API |
| "Send a weekly report email" | No | Code + cron job |
| "Debug this CI failure and open a PR with the fix" | Yes | Needs GitHub API, test runner, code editor |
| "What's 2 + 2?" | No | Code |
Red flag: If you're describing your agent as "it just generates text and returns it," you needed a plain LLM call, not an agent.
⚖️ Decision Matrix: Picking the Right Tool
flowchart TD
Start([New requirement]) --> Q1{Is the output\ndeterministic?}
Q1 -- Yes --> Code[Write Code\nif/else, math, regex]
Q1 -- No --> Q2{Is input\nstructured data?}
Q2 -- Yes --> ML[Traditional ML\nXGBoost, sklearn]
Q2 -- No --> Q3{Is a single\ngeneration enough?}
Q3 -- Yes --> LLM[LLM Call\nOpenAI, Anthropic, Gemini]
Q3 -- No --> Agent[AI Agent\nReAct + tools]
| Paradigm | Latency | Cost | Predictability | Best for |
| Code | Microseconds | Free | 100% deterministic | Rules, math, format |
| ML | Milliseconds | Low inference cost | High with good data | Structured predictions |
| LLM | 500ms–3s | $0.001–$0.06/1K tokens | Variable (hallucination risk) | Unstructured text |
| Agent | Seconds–minutes | Multiplied by iterations | Low without guardrails | Multi-step tool tasks |
📌 Key Takeaways
- Code before anything else — if you can write a rule, write code.
- Traditional ML for structured data with learnable patterns (fraud, churn, recs).
- LLMs for unstructured text tasks: summarization, classification, generation.
- Agents only when the task is multi-step and requires external tool calls.
- Cost and latency scale: Code < ML < LLM < Agent. Use the cheapest tool that solves the problem.
🧩 Test Your Understanding
- A checkout form validates that a zip code is exactly 5 digits. Should you use a LLM, ML, or code?
- You want to predict which users will churn in the next 30 days, using login history and support ticket count. What paradigm fits?
- A user asks your chatbot "What is the status of order #12345?" — the system needs to hit an orders API. LLM or agent?
- Why is cost×latency important in the code/ML/LLM/agent decision?
🔗 Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. �...
