Pydantic for Type-Safe AI
Pydantic brings runtime validation and type safety to Python AI applications, automatically converting JSON Schema to validated Python objects with IDE autocomplete.
Pydantic is a Python library for data validation using type annotations. In AI development, it bridges the gap between LLM outputs (JSON) and your Python code (typed objects), providing automatic validation, serialization, and IDE support.
The key insight: Define your data model once as a Python class, and Pydantic handles JSON Schema generation, validation, parsing, and type safety automatically.
Why Pydantic for AI?
Without Pydantic:
# Manual parsing, no validation, no types
response = client.chat.completions.create(...)
data = json.loads(response.choices[0].message.content)
name = data['name'] # Hope this exists
email = data['email'] # Hope it's a valid email
With Pydantic:
from pydantic import BaseModel, EmailStr
class Person(BaseModel):
name: str
email: EmailStr # Validates email format
age: int | None = None
# Automatic validation + IDE autocomplete
person = Person.model_validate_json(response.choices[0].message.content)
print(person.name) # Type-safe, guaranteed to exist
print(person.email) # Guaranteed valid email
from pydantic import BaseModel, Field, EmailStr, validator
from openai import OpenAI
import json
# Define your output structure as a Pydantic model
class Skill(BaseModel):
name: str
years_experience: int = Field(ge=0, description="Years of experience")
proficiency: str = Field(pattern="^(beginner|intermediate|expert)$")
class Resume(BaseModel):
name: str = Field(min_length=1)
email: EmailStr
skills: list[Skill] = Field(min_length=1)
summary: str | None = None
@validator('skills')
def validate_skills(cls, v):
if len(v) > 20:
raise ValueError('Maximum 20 skills allowed')
return v
# Pydantic automatically generates JSON Schema
schema = Resume.model_json_schema()
print(json.dumps(schema, indent=2))
# {
# "type": "object",
# "required": ["name", "email", "skills"],
# "properties": {
# "name": {"type": "string", "minLength": 1},
# "email": {"type": "string", "format": "email"},
# "skills": {
# "type": "array",
# "items": {...},
# "minItems": 1
# }
# }
# }
# Use the schema with OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Extract resume info from: John Doe, john@example.com, 5 years Python..."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "resume",
"schema": schema,
"strict": True
}
}
)
# Parse and validate in one step
resume = Resume.model_validate_json(response.choices[0].message.content)
# Type-safe access with IDE autocomplete
print(resume.name) # str
print(resume.email) # EmailStr (validated)
for skill in resume.skills: # list[Skill]
print(f"{skill.name}: {skill.proficiency}") # IDE knows these fields exist Pydantic with Instructor Library
The Instructor library makes Pydantic + LLMs even easier:
import instructor
from pydantic import BaseModel
from openai import OpenAI
# Patch OpenAI client to use Instructor
client = instructor.from_openai(OpenAI())
class Person(BaseModel):
name: str
age: int
email: str
# Instructor handles schema generation + validation automatically
person = client.chat.completions.create(
model="gpt-4o",
response_model=Person, # Just pass the Pydantic model
messages=[
{"role": "user", "content": "Extract: John is 30 years old, email john@example.com"}
]
)
print(person.name) # "John"
print(person.age) # 30
print(type(person)) # <class 'Person'> Pydantic Features for AI
1. Validation
- Type checking:
str,int,float,bool,list,dict - Format validation:
EmailStr,HttpUrl,UUID,datetime - Constraints:
Field(ge=0, le=100, min_length=1, pattern=r'^[A-Z]') - Custom validators:
@validatordecorator for business logic
2. Nested Models
class Address(BaseModel):
street: str
city: str
country: str
class Company(BaseModel):
name: str
address: Address # Nested validation
employees: list[Person] # List of validated objects
3. Optional Fields & Defaults
class Config(BaseModel):
required_field: str
optional_field: str | None = None
with_default: int = 10
from_function: str = Field(default_factory=lambda: "generated")
4. Discriminated Unions
from typing import Literal, Union
from pydantic import Field
class ToolCall(BaseModel):
type: Literal["function"]
function_name: str
class TextResponse(BaseModel):
type: Literal["text"]
content: str
Response = Union[ToolCall, TextResponse]
# Pydantic uses 'type' field to determine which model to use
Pydantic vs Manual JSON Parsing
| Aspect | Manual JSON | Pydantic |
|---|---|---|
| Type safety | None | Full typing + IDE support |
| Validation | Manual checks | Automatic |
| Schema generation | Write by hand | Auto-generated from model |
| Error messages | Generic | Detailed field-level errors |
| Refactoring | Find/replace strings | Type checker catches issues |
| Documentation | Separate docs | Self-documenting types |
When to Use Pydantic
| Use Pydantic When | Consider Alternatives When |
|---|---|
| Building production Python apps | Quick scripting/prototypes |
| You need type safety + validation | Output is simple key-value |
| Working with OpenAI/Anthropic APIs | Using JavaScript/TypeScript (use Zod) |
| Data flows through multiple functions | One-off parsing |
| You want IDE autocomplete | Performance is absolutely critical |
Common Patterns
Streaming with Partial Models
import instructor
from pydantic import BaseModel
class Analysis(BaseModel):
summary: str
entities: list[str]
sentiment: str
# Stream partial results as they arrive
for partial in client.chat.completions.create_partial(
model="gpt-4o",
response_model=Analysis,
messages=[...],
stream=True
):
print(partial.summary) # Updates as tokens arrive
Retry with Validation
import instructor
from tenacity import retry, stop_after_attempt
@retry(stop=stop_after_attempt(3))
def extract_with_retry(text: str) -> Person:
return client.chat.completions.create(
model="gpt-4o",
response_model=Person,
messages=[{"role": "user", "content": text}]
)
# Automatically retries if validation fails
person = extract_with_retry("Extract: invalid data...")
Key Takeaways
- Pydantic converts type annotations into runtime validation
- Auto-generates JSON Schema from Python classes
- Integrates seamlessly with OpenAI structured outputs
- Provides IDE autocomplete and type checking
- Use Instructor library for simplified LLM integration
- Pydantic v2 is 5-50x faster than v1 (Rust core)
In This Platform
While this platform uses JSON Schema directly (JavaScript/Node.js), the same validation principles apply. If we were building in Python, every content type (Dimension, Question, Source, Concept) would be a Pydantic model, giving us type safety and validation at runtime.
- schema/survey.schema.json
- schema/concept.schema.json
# Hypothetical: Python version of this platform
from pydantic import BaseModel, Field, HttpUrl
from typing import Literal
class SourceReference(BaseModel):
id: str
claim: str
quote: str | None = None
page: str | None = None
class QuestionOption(BaseModel):
text: str
score: int = Field(ge=0)
sources: list[SourceReference] = []
class Question(BaseModel):
id: str
text: str
type: Literal["single_choice", "multi_select", "likert"]
options: list[QuestionOption] = Field(min_length=1)
max_score: int
sources: list[SourceReference] = []
# Type-safe loading
with open('dimensions/adoption.json') as f:
dimension_data = json.load(f)
questions = [Question(**q) for q in dimension_data['questions']]
# IDE knows the structure
for q in questions:
print(f"{q.id}: {q.type}") # Autocomplete works
print(q.options[0].score) # Type-checked