Implement a RAG Pipeline

Build intermediate 45 min typescript

Sources not yet verified

Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.

1. Understand the Scenario

You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.

Learning Objectives

Create document embeddings using OpenAI's embedding API
Implement cosine similarity for vector search
Build a retrieval pipeline that finds relevant chunks
Generate answers with source citations

Concepts You'll Practice

Embeddings & Vector Representations Retrieval-Augmented Generation (RAG)Context Windows

2. Follow the Instructions

What You'll Build

A documentation assistant that:

Indexes your docs by embedding them into vectors
Finds the most relevant chunks when a user asks a question
Passes those chunks to an LLM to generate an answer
Cites which documentation was used

Step 1: Prepare Your Documents

First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.

// Sample documents to index
const documents = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1', 
    title: 'Rate Limits',
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling', 
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

Step 2: Create Embeddings

Embed each document chunk. Store the embeddings alongside the original content.

import OpenAI from 'openai';

const openai = new OpenAI();

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

Step 3: Implement Vector Search

When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.

Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.

3. Try It Yourself

starter_code.ts
 import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // TODO: Embed query, calculate similarity with each doc, sort and return top K
  throw new Error('Not implemented');
}

async function generateAnswer(question: string): Promise<string> {
  // TODO: Find relevant docs, build context string, create prompt with citations instruction
  throw new Error('Not implemented');
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. 

This typescript exercise requires local setup. Copy the code to your IDE to run.

4. Get Help (If Needed)

Reveal progressive hints

Hint 1: Cosine similarity formula: (a · b) / (||a|| × ||b||). The dot product divided by the product of magnitudes.

Hint 2: Use Promise.all to embed the query and sort results by similarity. The embedding model is the same for queries and documents.

Hint 3: For the generateAnswer function, include the document titles in the context so the LLM can cite them. Format like: [Title]: Content

5. Check the Solution

Reveal the complete solution

solution.ts
 import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  return Promise.all(docs.map(async (doc) => ({
    ...doc,
    embedding: await embedText(doc.content)
  })));
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // SOLUTION_START hint="Embed query, calculate similarity with each doc, sort and return top K"
  // 1. Embed the query
  const queryEmbedding = await embedText(query);

  // 2. Calculate similarity with each indexed document
  const scored = index.map(doc => ({
    doc,
    similarity: cosineSimilarity(queryEmbedding, doc.embedding)
  }));

  // 3. Sort by similarity and return top K
  return scored
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK)
    .map(s => s.doc);
  // SOLUTION_END
}

async function generateAnswer(question: string): Promise<string> {
  // SOLUTION_START hint="Find relevant docs, build context string, create prompt with citations instruction"
  // 1. Find relevant documents
  const relevantDocs = await findRelevantDocs(question, 2);

  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map(doc => `[${doc.title}]: ${doc.content}`)
    .join('\n\n');

  // 3. Generate answer with citations
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
      },
      {
        role: 'user',
        content: question
      }
    ]
  });

  return response.choices[0].message.content || '';
  // SOLUTION_END
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. 

Common Mistakes

Using different embedding models for documents vs queries

Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless

How to fix: Always use the same embedding model for both indexing and querying

Not normalizing vectors before cosine similarity

Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores

How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements

Chunks that are too large or too small

Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)

How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks

Test Cases

Cosine similarity works

Input: cosineSimilarity([1, 0], [1, 0])

Expected: 1 (identical vectors)

Retrieves relevant documents

Input: findRelevantDocs('authentication')

Expected: Should return auth-1 document first

Answer includes citation

Input: generateAnswer('How do I authenticate?')

Expected: Response should contain [Authentication] citation

Sources

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts