Pydantic for Type-Safe AI

Type Systems intermediate 20 min

Sources verified Dec 22

Pydantic brings runtime validation and type safety to Python AI applications, automatically converting JSON Schema to validated Python objects with IDE autocomplete.

Pydantic is a Python library for data validation using type annotations. In AI development, it bridges the gap between LLM outputs (JSON) and your Python code (typed objects), providing automatic validation, serialization, and IDE support.

The key insight: Define your data model once as a Python class, and Pydantic handles JSON Schema generation, validation, parsing, and type safety automatically.

Why Pydantic for AI?

Without Pydantic:

# Manual parsing, no validation, no types
response = client.chat.completions.create(...)
data = json.loads(response.choices[0].message.content)
name = data['name']  # Hope this exists
email = data['email']  # Hope it's a valid email

With Pydantic:

from pydantic import BaseModel, EmailStr

class Person(BaseModel):
    name: str
    email: EmailStr  # Validates email format
    age: int | None = None

# Automatic validation + IDE autocomplete
person = Person.model_validate_json(response.choices[0].message.content)
print(person.name)  # Type-safe, guaranteed to exist
print(person.email)  # Guaranteed valid email

pydantic_openai.py
 from pydantic import BaseModel, Field, EmailStr, validator
from openai import OpenAI
import json

# Define your output structure as a Pydantic model
class Skill(BaseModel):
    name: str
    years_experience: int = Field(ge=0, description="Years of experience")
    proficiency: str = Field(pattern="^(beginner|intermediate|expert)$")

class Resume(BaseModel):
    name: str = Field(min_length=1)
    email: EmailStr
    skills: list[Skill] = Field(min_length=1)
    summary: str | None = None
    
    @validator('skills')
    def validate_skills(cls, v):
        if len(v) > 20:
            raise ValueError('Maximum 20 skills allowed')
        return v

# Pydantic automatically generates JSON Schema
schema = Resume.model_json_schema()
print(json.dumps(schema, indent=2))
# {
#   "type": "object",
#   "required": ["name", "email", "skills"],
#   "properties": {
#     "name": {"type": "string", "minLength": 1},
#     "email": {"type": "string", "format": "email"},
#     "skills": {
#       "type": "array",
#       "items": {...},
#       "minItems": 1
#     }
#   }
# }

# Use the schema with OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Extract resume info from: John Doe, john@example.com, 5 years Python..."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "resume",
            "schema": schema,
            "strict": True
        }
    }
)

# Parse and validate in one step
resume = Resume.model_validate_json(response.choices[0].message.content)

# Type-safe access with IDE autocomplete
print(resume.name)  # str
print(resume.email)  # EmailStr (validated)
for skill in resume.skills:  # list[Skill]
    print(f"{skill.name}: {skill.proficiency}")  # IDE knows these fields exist 
  Field() adds validation and descriptions 
  Custom validators for business logic 
  Auto-generate JSON Schema from model 
  Parse + validate in one call 
  Full type safety and autocomplete 

Pydantic with Instructor Library

The Instructor library makes Pydantic + LLMs even easier:

instructor_example.py
 import instructor
from pydantic import BaseModel
from openai import OpenAI

# Patch OpenAI client to use Instructor
client = instructor.from_openai(OpenAI())

class Person(BaseModel):
    name: str
    age: int
    email: str

# Instructor handles schema generation + validation automatically
person = client.chat.completions.create(
    model="gpt-4o",
    response_model=Person,  # Just pass the Pydantic model
    messages=[
        {"role": "user", "content": "Extract: John is 30 years old, email john@example.com"}
    ]
)

print(person.name)  # "John"
print(person.age)   # 30
print(type(person)) # <class 'Person'> 
  response_model automatically handles schema + validation 
  Returns typed Pydantic object, not JSON string 

Pydantic Features for AI

1. Validation

Type checking: str, int, float, bool, list, dict
Format validation: EmailStr, HttpUrl, UUID, datetime
Constraints: Field(ge=0, le=100, min_length=1, pattern=r'^[A-Z]')
Custom validators: @validator decorator for business logic

2. Nested Models

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    address: Address  # Nested validation
    employees: list[Person]  # List of validated objects

3. Optional Fields & Defaults

class Config(BaseModel):
    required_field: str
    optional_field: str | None = None
    with_default: int = 10
    from_function: str = Field(default_factory=lambda: "generated")

4. Discriminated Unions

from typing import Literal, Union
from pydantic import Field

class ToolCall(BaseModel):
    type: Literal["function"]
    function_name: str

class TextResponse(BaseModel):
    type: Literal["text"]
    content: str

Response = Union[ToolCall, TextResponse]
# Pydantic uses 'type' field to determine which model to use

Pydantic vs Manual JSON Parsing

Aspect	Manual JSON	Pydantic
Type safety	None	Full typing + IDE support
Validation	Manual checks	Automatic
Schema generation	Write by hand	Auto-generated from model
Error messages	Generic	Detailed field-level errors
Refactoring	Find/replace strings	Type checker catches issues
Documentation	Separate docs	Self-documenting types

When to Use Pydantic

Use Pydantic When	Consider Alternatives When
Building production Python apps	Quick scripting/prototypes
You need type safety + validation	Output is simple key-value
Working with OpenAI/Anthropic APIs	Using JavaScript/TypeScript (use Zod)
Data flows through multiple functions	One-off parsing
You want IDE autocomplete	Performance is absolutely critical

Common Patterns

Streaming with Partial Models

import instructor
from pydantic import BaseModel

class Analysis(BaseModel):
    summary: str
    entities: list[str]
    sentiment: str

# Stream partial results as they arrive
for partial in client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=Analysis,
    messages=[...],
    stream=True
):
    print(partial.summary)  # Updates as tokens arrive

Retry with Validation

import instructor
from tenacity import retry, stop_after_attempt

@retry(stop=stop_after_attempt(3))
def extract_with_retry(text: str) -> Person:
    return client.chat.completions.create(
        model="gpt-4o",
        response_model=Person,
        messages=[{"role": "user", "content": text}]
    )

# Automatically retries if validation fails
person = extract_with_retry("Extract: invalid data...")

Key Takeaways

Pydantic converts type annotations into runtime validation
Auto-generates JSON Schema from Python classes
Integrates seamlessly with OpenAI structured outputs
Provides IDE autocomplete and type checking
Use Instructor library for simplified LLM integration
Pydantic v2 is 5-50x faster than v1 (Rust core)

In This Platform

While this platform uses JSON Schema directly (JavaScript/Node.js), the same validation principles apply. If we were building in Python, every content type (Dimension, Question, Source, Concept) would be a Pydantic model, giving us type safety and validation at runtime.

Relevant Files:

schema/survey.schema.json
schema/concept.schema.json

platform_pydantic.py (hypothetical)
 # Hypothetical: Python version of this platform
from pydantic import BaseModel, Field, HttpUrl
from typing import Literal

class SourceReference(BaseModel):
    id: str
    claim: str
    quote: str | None = None
    page: str | None = None

class QuestionOption(BaseModel):
    text: str
    score: int = Field(ge=0)
    sources: list[SourceReference] = []

class Question(BaseModel):
    id: str
    text: str
    type: Literal["single_choice", "multi_select", "likert"]
    options: list[QuestionOption] = Field(min_length=1)
    max_score: int
    sources: list[SourceReference] = []

# Type-safe loading
with open('dimensions/adoption.json') as f:
    dimension_data = json.load(f)
    questions = [Question(**q) for q in dimension_data['questions']]

# IDE knows the structure
for q in questions:
    print(f"{q.id}: {q.type}")  # Autocomplete works
    print(q.options[0].score)   # Type-checked 

Prerequisites

Sources

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts

Pydantic for Type-Safe AI

Why Pydantic for AI?

Pydantic with Instructor Library

Pydantic Features for AI

1. Validation

2. Nested Models

3. Optional Fields & Defaults

4. Discriminated Unions

Pydantic vs Manual JSON Parsing

When to Use Pydantic

Common Patterns

Streaming with Partial Models

Retry with Validation

Key Takeaways

In This Platform

Related Concepts

Prerequisites

Sources