Fine-Tuning vs RAG
Should I fine-tune a model or use RAG for domain-specific knowledge?
Two approaches to adding specialized knowledge to LLMs. RAG is usually the right choice for most use cases; fine-tuning has specific advantages for style and format.
Approaches
RAG (Retrieval Augmented Generation)
moderateStore your knowledge in a vector database. At query time, retrieve relevant documents and include them in the prompt context.
Pros
- Knowledge is easily updateable (just update documents)
- Provides source attribution and citations
- No training costs or wait time
- Works with any model without modification
- Can handle vast knowledge bases (millions of docs)
- Factually grounded in actual documents
Cons
- Higher per-request latency (embed + search + generate)
- More tokens per request = higher API costs
- Quality depends on chunking and retrieval quality
- Requires vector database infrastructure
- Can retrieve irrelevant passages on ambiguous queries
Use When
- Knowledge changes frequently (news, documentation, inventory)
- You need source citations for compliance or trust
- You have a large knowledge base (too big to fit in context)
- You want to avoid training costs and complexity
- Factual accuracy is critical
Avoid When
- You need the model to adopt a specific writing style
- The knowledge is simple enough to fit in a system prompt
- Sub-second latency is critical
- You're optimizing for minimum cost per request
Show Code Example
// RAG approach for company policy questions
async function answerPolicyQuestion(question: string) {
// 1. Find relevant policy documents
const docs = await vectorDB.search(await embed(question), { topK: 3 });
// 2. Generate answer with retrieved context
const response = await llm.chat({
messages: [
{
role: 'system',
content: `Answer based on company policy documents. Cite the source document.
Policy documents:
${docs.map(d => `[${d.title}]: ${d.content}`).join('\n\n')}`
},
{ role: 'user', content: question }
]
});
return response; // "According to [Vacation Policy], you accrue 2 days per month..."
} Fine-Tuning
complexTrain a model on your specific data to internalize knowledge, style, or format. The model 'learns' your domain.
Pros
- Lower inference latency (no retrieval step)
- Fewer tokens per request (no context stuffing)
- Learns writing style, tone, and format
- Better at complex reasoning patterns if trained on examples
- Can learn domain-specific terminology and jargon
Cons
- High upfront training cost and complexity
- Knowledge is frozen at training time
- Updates require retraining (days, not seconds)
- No source attribution (model 'just knows')
- Risk of overfitting or catastrophic forgetting
- Less transparent — harder to debug wrong answers
Use When
- You need a specific writing style (legal, medical, brand voice)
- The knowledge is stable and rarely changes
- Latency is critical and you can't afford retrieval
- You have high volume and need to minimize per-request tokens
- You need the model to learn complex reasoning patterns
Avoid When
- Knowledge changes frequently
- You need source citations
- You don't have quality training data (hundreds of examples minimum)
- You need to explain why the model gave a specific answer
Show Code Example
# Fine-tuning approach: prepare training data
training_data = [
{
"messages": [
{"role": "system", "content": "You are a legal assistant for Acme Corp."},
{"role": "user", "content": "What's the vacation policy?"},
{"role": "assistant", "content": "Employees accrue 2 days of PTO per month..."}
]
},
# ... hundreds more examples in your style/format
]
# Upload and train (can take hours/days)
client.fine_tuning.jobs.create(
training_file=file_id,
model="gpt-4o-mini"
)
# After training, use the fine-tuned model
response = client.chat.completions.create(
model="ft:gpt-4o-mini:acme-corp:policy-bot",
messages=[{"role": "user", "content": "What's the vacation policy?"}]
) # Responds in trained style without needing context Hybrid (Fine-Tuned + RAG)
complexFine-tune for style/format, but still use RAG for factual grounding. Best of both worlds, but most complex.
Pros
- Consistent style AND accurate facts
- Can cite sources while maintaining brand voice
- Handles both stable and changing knowledge
Cons
- Most complex to build and maintain
- Highest total cost (training + infrastructure + inference)
- Two systems to debug when things go wrong
Use When
- Enterprise applications with strict requirements
- Need both brand consistency and factual accuracy
- Budget and team capacity for complex system
Avoid When
- Building an MVP
- Limited engineering resources
- Either pure approach would suffice
Show Code Example
// Hybrid: Fine-tuned model + RAG
async function answer(question: string) {
const docs = await vectorDB.search(await embed(question));
// Use fine-tuned model that knows your style
return await fineTunedModel.chat({
messages: [
{ role: 'system', content: `Context from docs:\n${docs.join('\n')}` },
{ role: 'user', content: question }
]
});
} Decision Factors
| Factor | RAG (Retrieval Augmented Generation) | Fine-Tuning | Hybrid (Fine-Tuned + RAG) |
|---|---|---|---|
| Knowledge freshness How often does the information change? | Frequently (daily/weekly) - just update documents | Rarely (yearly) - stable domain knowledge | Mixed - some stable, some changing |
| Source attribution Do you need to cite sources or explain answers? | Yes - natural source attribution from retrieved docs | No - model 'just knows' without sources | Yes with consistent style |
| Style/format requirements Does the output need a specific style or format? | Minimal - system prompt can guide style | Critical - model learns from examples | Critical with factual accuracy needs |
| Latency requirements How fast must responses be? | 1-5s acceptable | <1s required | 1-5s acceptable |
| Available training data Do you have high-quality input/output examples? | Not required | Need 100s-1000s of examples | Need training data + document corpus |
Real-World Scenarios
Customer support bot for a SaaS product with frequently updated documentation
Documentation changes often, and customers expect accurate answers about current features. Source citations build trust.
Legal document generator that must follow a specific firm's writing style
Legal style is stable and consistent. Documents need to match firm's established patterns. No need for citations within generated text.
Medical assistant that needs to cite sources AND maintain clinical tone
Medical accuracy requires grounding (liability). Clinical tone requires consistent style. Both are critical.
Internal company chatbot answering HR policy questions
Policies update periodically. Employees want to see which policy document an answer came from. Simpler than fine-tuning.