Implement a RAG Pipeline
Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.
1. Understand the Scenario
You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.
Learning Objectives
- Create document embeddings using OpenAI's embedding API
- Implement cosine similarity for vector search
- Build a retrieval pipeline that finds relevant chunks
- Generate answers with source citations
Concepts You'll Practice
2. Follow the Instructions
What You'll Build
A documentation assistant that:
- Indexes your docs by embedding them into vectors
- Finds the most relevant chunks when a user asks a question
- Passes those chunks to an LLM to generate an answer
- Cites which documentation was used
Step 1: Prepare Your Documents
First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.
// Sample documents to index
const documents = [
{
id: 'auth-1',
title: 'Authentication',
content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
},
{
id: 'rate-1',
title: 'Rate Limits',
content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
},
{
id: 'errors-1',
title: 'Error Handling',
content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
}
];
Step 2: Create Embeddings
Embed each document chunk. Store the embeddings alongside the original content.
import OpenAI from 'openai';
const openai = new OpenAI();
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
// Embed all documents
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
return Promise.all(docs.map(async (doc) => ({
...doc,
embedding: await embedText(doc.content)
})));
}
Step 3: Implement Vector Search
When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.
Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.
3. Try It Yourself
import OpenAI from 'openai';
const openai = new OpenAI();
interface Document {
id: string;
title: string;
content: string;
}
interface IndexedDocument extends Document {
embedding: number[];
}
const documents: Document[] = [
{
id: 'auth-1',
title: 'Authentication',
content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
},
{
id: 'rate-1',
title: 'Rate Limits',
content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
},
{
id: 'errors-1',
title: 'Error Handling',
content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
}
];
let index: IndexedDocument[] = [];
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
return Promise.all(docs.map(async (doc) => ({
...doc,
embedding: await embedText(doc.content)
})));
}
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
async function findRelevantDocs(
query: string,
topK: number = 3
): Promise<IndexedDocument[]> {
// TODO: Embed query, calculate similarity with each doc, sort and return top K
throw new Error('Not implemented');
}
async function generateAnswer(question: string): Promise<string> {
// TODO: Find relevant docs, build context string, create prompt with citations instruction
throw new Error('Not implemented');
}
async function main() {
console.log('Indexing documents...');
index = await indexDocuments(documents);
console.log(`Indexed ${index.length} documents\n`);
const question = 'How do I authenticate with the API?';
console.log(`Q: ${question}\n`);
const answer = await generateAnswer(question);
console.log(`A: ${answer}`);
}
main();
// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. This typescript exercise requires local setup. Copy the code to your IDE to run.
4. Get Help (If Needed)
Reveal progressive hints
5. Check the Solution
Reveal the complete solution
import OpenAI from 'openai';
const openai = new OpenAI();
interface Document {
id: string;
title: string;
content: string;
}
interface IndexedDocument extends Document {
embedding: number[];
}
const documents: Document[] = [
{
id: 'auth-1',
title: 'Authentication',
content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
},
{
id: 'rate-1',
title: 'Rate Limits',
content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
},
{
id: 'errors-1',
title: 'Error Handling',
content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
}
];
let index: IndexedDocument[] = [];
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
return Promise.all(docs.map(async (doc) => ({
...doc,
embedding: await embedText(doc.content)
})));
}
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
async function findRelevantDocs(
query: string,
topK: number = 3
): Promise<IndexedDocument[]> {
// SOLUTION_START hint="Embed query, calculate similarity with each doc, sort and return top K"
// 1. Embed the query
const queryEmbedding = await embedText(query);
// 2. Calculate similarity with each indexed document
const scored = index.map(doc => ({
doc,
similarity: cosineSimilarity(queryEmbedding, doc.embedding)
}));
// 3. Sort by similarity and return top K
return scored
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK)
.map(s => s.doc);
// SOLUTION_END
}
async function generateAnswer(question: string): Promise<string> {
// SOLUTION_START hint="Find relevant docs, build context string, create prompt with citations instruction"
// 1. Find relevant documents
const relevantDocs = await findRelevantDocs(question, 2);
// 2. Build context from retrieved documents
const context = relevantDocs
.map(doc => `[${doc.title}]: ${doc.content}`)
.join('\n\n');
// 3. Generate answer with citations
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
},
{
role: 'user',
content: question
}
]
});
return response.choices[0].message.content || '';
// SOLUTION_END
}
async function main() {
console.log('Indexing documents...');
index = await indexDocuments(documents);
console.log(`Indexed ${index.length} documents\n`);
const question = 'How do I authenticate with the API?';
console.log(`Q: ${question}\n`);
const answer = await generateAnswer(question);
console.log(`A: ${answer}`);
}
main();
// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. Common Mistakes
Using different embedding models for documents vs queries
Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless
How to fix: Always use the same embedding model for both indexing and querying
Not normalizing vectors before cosine similarity
Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores
How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements
Chunks that are too large or too small
Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)
How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks
Test Cases
Cosine similarity works
cosineSimilarity([1, 0], [1, 0])1 (identical vectors)Retrieves relevant documents
findRelevantDocs('authentication')Should return auth-1 document firstAnswer includes citation
generateAnswer('How do I authenticate?')Response should contain [Authentication] citation