Skip to content

Chat vs RAG vs Agent

When should I use a simple chatbot vs RAG vs an autonomous agent?

Three progressively complex approaches to AI applications. Start simple, add complexity only when needed.

beginner 20 min
Sources verified Dec 22

Approaches

Simple Chat (Direct API Call)

simple

Send user message to LLM, get response back. No retrieval, no tools, no planning.

Latency: 500ms-2s
Cost: Low - just API token costs

Pros

  • Extremely simple to implement (5-20 lines of code)
  • Lowest latency - single API call
  • Lowest cost - no additional infrastructure
  • Easy to debug and understand

Cons

  • Limited to model's training data (knowledge cutoff)
  • Can't access real-time information
  • No access to your proprietary data
  • May hallucinate when asked about specifics

Use When

  • General Q&A where the model's training data is sufficient
  • Creative writing, brainstorming, ideation
  • Code generation for common patterns
  • Summarization of user-provided text

Avoid When

  • Users need current information (news, prices, weather)
  • Answers must reference your proprietary documents
  • High accuracy is required for specific facts
  • The task requires taking actions in external systems
Show Code Example
simple_chat.ts
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: userMessage }
  ]
});

return response.choices[0].message.content;

RAG (Retrieval Augmented Generation)

moderate

Retrieve relevant documents from a vector database, include them in the prompt, then generate a response grounded in that context.

Latency: 1-5s
Cost: Medium - vector DB hosting + more tokens

Pros

  • Grounds responses in your actual documents
  • Reduces hallucination with source attribution
  • Can handle proprietary/private data
  • Knowledge can be updated without retraining

Cons

  • Requires vector database infrastructure
  • Quality depends on chunking and retrieval
  • Higher latency (embedding + search + generation)
  • Can retrieve irrelevant passages

Use When

  • Answering questions about your documentation
  • Customer support with knowledge bases
  • Legal/compliance research on internal documents
  • Any Q&A where accuracy and citations matter

Avoid When

  • You don't have documents to retrieve from
  • Questions are general knowledge (just use chat)
  • Sub-second latency is critical
  • The task requires taking actions, not just answering
Show Code Example
rag_example.ts
// 1. Embed the query
const embedding = await embed(userQuery);

// 2. Search vector database
const relevantDocs = await vectorDB.search(embedding, { topK: 5 });

// 3. Generate with context
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Answer based on the provided context.' },
    { role: 'user', content: `Context:\n${relevantDocs.join('\n')}\n\nQuestion: ${userQuery}` }
  ]
});

return response.choices[0].message.content;

Agent (Autonomous with Tools)

complex

An LLM that can reason, plan, use tools, and take actions in a loop until the task is complete.

Latency: 10s-minutes
Cost: High - multiple LLM calls, tool execution

Pros

  • Can accomplish multi-step tasks autonomously
  • Can use external tools and APIs
  • Handles complex reasoning and planning
  • Can take actions, not just provide information

Cons

  • Significantly more complex to build and debug
  • Higher latency and cost (multiple LLM calls)
  • Can get stuck in loops or make mistakes
  • Requires careful guardrails and supervision

Use When

  • Tasks require multiple steps or tools
  • User needs actions taken, not just answers
  • Complex research requiring multiple sources
  • Workflow automation with decision-making

Avoid When

  • A simple answer would suffice
  • Latency is critical (users won't wait)
  • The task is well-defined with known steps (just script it)
  • You can't tolerate occasional failures or loops
Show Code Example
agent_example.ts
const agent = new Agent({
  model: 'gpt-4o',
  tools: [searchWeb, readFile, sendEmail, createTicket],
  maxIterations: 10
});

// Agent will plan, execute tools, and iterate
const result = await agent.run(
  'Research competitor pricing and create a summary report'
);

// Result includes: final answer, actions taken, reasoning trace

Decision Factors

Factor Simple Chat (Direct API Call)RAG (Retrieval Augmented Generation)Agent (Autonomous with Tools)
Data source

Where does the answer come from?

Use for general knowledge in model's training dataUse when answers must come from your documentsUse when answers require combining multiple sources or APIs
Action required

Does the user need information or action?

Information only, no external actionsInformation only, but grounded in your dataActions required (file operations, API calls, etc.)
Latency tolerance

How long can the user wait?

Best for <2s response requirementsAcceptable for 2-5s response timesOnly for tasks where users expect to wait
Accuracy requirements

How critical is factual accuracy?

Acceptable for casual use, creative tasksBetter for factual queries with source citationUse when task verification is possible

Real-World Scenarios

Customer asks 'What is your return policy?' on support chat

Recommended: rag

Answer must come from your actual policy documents, and you want to cite the source.

Alternatives:
simple_chat: If policy is extremely simple and you've included it in the system prompt

User asks 'Write me a poem about coffee'

Recommended: simple_chat

Creative task with no external data needs. RAG or agents add latency with no benefit.

User says 'Research our competitors and create a report'

Recommended: agent

Multi-step task requiring web search, analysis, and document creation. Simple approaches can't handle this.

Alternatives:
rag: If 'research' means searching your existing competitor analysis documents

Internal tool: 'Answer questions about our codebase'

Recommended: rag

Answers must be grounded in your actual code. Embed your codebase and retrieve relevant snippets.

Common Misconceptions

Myth: Agents are always better because they're more capable
Reality: Agents are slower, more expensive, and harder to debug. Use the simplest approach that works.
Myth: RAG eliminates hallucination
Reality: RAG reduces hallucination but doesn't eliminate it. The model can still misinterpret or ignore retrieved context.
Myth: You need RAG for any production use case
Reality: Many production apps use simple chat effectively. RAG is for when you need grounding in specific documents.

Quick Decision Guide

Default choice: simple_chat

Start with the simplest approach. Only add complexity (RAG, then agents) when you have a concrete reason. Most use cases don't need agents.

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts