Skip to content

Implement a RAG Pipeline

Build intermediate 45 min typescript
Sources not yet verified

Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.

1. Understand the Scenario

You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.

Learning Objectives

  • Create document embeddings using OpenAI's embedding API
  • Implement cosine similarity for vector search
  • Build a retrieval pipeline that finds relevant chunks
  • Generate answers with source citations

2. Follow the Instructions

What You'll Build

A documentation assistant that:

  1. Indexes your docs by embedding them into vectors
  2. Finds the most relevant chunks when a user asks a question
  3. Passes those chunks to an LLM to generate an answer
  4. Cites which documentation was used

Step 1: Prepare Your Documents

First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.

// Sample documents to index
const documents = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1', 
    title: 'Rate Limits',
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling', 
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

Step 2: Create Embeddings

Embed each document chunk. Store the embeddings alongside the original content.

import OpenAI from 'openai';

const openai = new OpenAI();

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.

Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.

3. Try It Yourself

starter_code.ts
import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // TODO: Embed query, calculate similarity with each doc, sort and return top K
  throw new Error('Not implemented');
}

async function generateAnswer(question: string): Promise<string> {
  // TODO: Find relevant docs, build context string, create prompt with citations instruction
  throw new Error('Not implemented');
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication].

This typescript exercise requires local setup. Copy the code to your IDE to run.

4. Get Help (If Needed)

Reveal progressive hints
Hint 1: Cosine similarity formula: (a · b) / (||a|| × ||b||). The dot product divided by the product of magnitudes.
Hint 2: Use Promise.all to embed the query and sort results by similarity. The embedding model is the same for queries and documents.
Hint 3: For the generateAnswer function, include the document titles in the context so the LLM can cite them. Format like: [Title]: Content

5. Check the Solution

Reveal the complete solution
solution.ts
import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // SOLUTION_START hint="Embed query, calculate similarity with each doc, sort and return top K"
  // 1. Embed the query
  const queryEmbedding = await embedText(query);

  // 2. Calculate similarity with each indexed document
  const scored = index.map(doc => ({
    doc,
    similarity: cosineSimilarity(queryEmbedding, doc.embedding)
  }));

  // 3. Sort by similarity and return top K
  return scored
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK)
    .map(s => s.doc);
  // SOLUTION_END
}

async function generateAnswer(question: string): Promise<string> {
  // SOLUTION_START hint="Find relevant docs, build context string, create prompt with citations instruction"
  // 1. Find relevant documents
  const relevantDocs = await findRelevantDocs(question, 2);

  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map(doc => `[${doc.title}]: ${doc.content}`)
    .join('\n\n');

  // 3. Generate answer with citations
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
      },
      {
        role: 'user',
        content: question
      }
    ]
  });

  return response.choices[0].message.content || '';
  // SOLUTION_END
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication].

Common Mistakes

Using different embedding models for documents vs queries

Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless

How to fix: Always use the same embedding model for both indexing and querying

Not normalizing vectors before cosine similarity

Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores

How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements

Chunks that are too large or too small

Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)

How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks

Test Cases

Cosine similarity works

Input: cosineSimilarity([1, 0], [1, 0])
Expected: 1 (identical vectors)

Retrieves relevant documents

Input: findRelevantDocs('authentication')
Expected: Should return auth-1 document first

Answer includes citation

Input: generateAnswer('How do I authenticate?')
Expected: Response should contain [Authentication] citation

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts