JSON Schema for Structured Outputs
JSON Schema defines the exact structure and constraints for LLM outputs, ensuring type-safe, validated responses without post-processing guesswork.
JSON Schema is a vocabulary for annotating and validating JSON documents. When applied to LLM outputs, it transforms unpredictable text generation into structured, validated data that your code can safely consume.
The key insight: Instead of parsing free-form text and hoping the LLM followed your instructions, you define the output schema upfront and the LLM is constrained to produce only valid JSON matching that schema.
Why JSON Schema for AI?
Without structured outputs, you get:
- Unpredictable text formats that require complex parsing
- Missing fields you expected
- Type mismatches (strings when you need numbers)
- Inconsistent naming (sometimes 'email', sometimes 'emailAddress')
With JSON Schema, the LLM's output is guaranteed to match your specification.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["name", "email", "skills"],
"properties": {
"name": {
"type": "string",
"description": "Full name of the person"
},
"email": {
"type": "string",
"format": "email",
"description": "Valid email address"
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150,
"description": "Age in years (optional)"
},
"skills": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"description": "List of professional skills"
},
"experience_level": {
"type": "string",
"enum": ["junior", "mid", "senior", "staff"],
"description": "Career level"
}
},
"additionalProperties": false
} OpenAI Structured Outputs Example
OpenAI's response_format parameter enforces JSON Schema compliance:
import OpenAI from 'openai';
const openai = new OpenAI();
const schema = {
type: 'object',
required: ['entities', 'sentiment', 'summary'],
properties: {
entities: {
type: 'array',
items: {
type: 'object',
required: ['text', 'type'],
properties: {
text: { type: 'string' },
type: {
type: 'string',
enum: ['person', 'organization', 'location', 'date']
}
},
additionalProperties: false
}
},
sentiment: {
type: 'string',
enum: ['positive', 'negative', 'neutral']
},
summary: { type: 'string' }
},
additionalProperties: false
};
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Analyze this email: ...' }
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'email_analysis',
schema: schema,
strict: true // Enforces schema compliance
}
}
});
// Output is GUARANTEED to match schema
const result = JSON.parse(response.choices[0].message.content);
console.log(result.entities); // TypeScript knows this exists
console.log(result.sentiment); // TypeScript knows this is 'positive' | 'negative' | 'neutral' Common Schema Patterns
Nested Objects
{
"type": "object",
"properties": {
"address": {
"type": "object",
"required": ["city", "country"],
"properties": {
"street": { "type": "string" },
"city": { "type": "string" },
"country": { "type": "string" }
}
}
}
}
Arrays of Specific Types
{
"type": "array",
"items": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": { "type": "integer" },
"name": { "type": "string" }
}
},
"minItems": 1,
"maxItems": 10
}
Discriminated Unions (oneOf)
{
"oneOf": [
{
"type": "object",
"required": ["type", "url"],
"properties": {
"type": { "const": "link" },
"url": { "type": "string", "format": "uri" }
}
},
{
"type": "object",
"required": ["type", "path"],
"properties": {
"type": { "const": "file" },
"path": { "type": "string" }
}
}
]
}
When to Use JSON Schema
| Use JSON Schema When | Consider Alternatives When |
|---|---|
| You need validated, structured data | Free-form text is acceptable |
| Output feeds directly into your code | Output is for human reading |
| Type safety matters (production apps) | Prototyping/experimenting |
| You need arrays, nested objects, enums | Simple key-value extraction |
| Compliance/audit requires validation | Speed is more critical than validation |
JSON Schema Best Practices
- Start simple: Begin with required fields only, add constraints later
- Use descriptions: They guide the LLM on what to extract
- Set
additionalProperties: false: Prevents unexpected fields - Use enums for categories: Better than free-form strings
- Validate business rules: Use
minimum,maxLength,pattern, etc. - Version your schemas: Track changes over time
- Test with edge cases: Empty arrays, optional fields, null values
Key Takeaways
- JSON Schema defines structure, types, and constraints for LLM outputs
- OpenAI's structured outputs enforce schema compliance (no validation needed)
- Use `required`, `enum`, `additionalProperties: false` for strict typing
- Descriptions guide the LLM on what to extract
- JSON Schema is language-agnostic but pairs with Pydantic (Python) and Zod (TypeScript)
- Not all providers enforce schemas — check documentation for guarantees
In This Platform
This entire platform is built on JSON Schema. Every content type (dimension, question, source, concept, module) has a corresponding schema that validates structure at build time. The schema files serve as both documentation and enforcement.
- schema/survey.schema.json
- schema/concept.schema.json
- schema/learning.schema.json
- schema/exercise.schema.json
- schema/comparison.schema.json
// build.js validates all content against schemas
import Ajv from 'ajv';
import addFormats from 'ajv-formats';
const ajv = new Ajv({ allErrors: true });
addFormats(ajv);
// Load schema
const questionSchema = JSON.parse(fs.readFileSync('schema/survey.schema.json'));
const validate = ajv.compile(questionSchema);
// Validate each question
for (const question of questions) {
const valid = validate(question);
if (!valid) {
console.error(`Invalid question ${question.id}:`);
console.error(validate.errors);
process.exit(1);
}
}
// Only valid content makes it to production