Skip to content

JSON Schema for Structured Outputs

Type Systems beginner 15 min
Sources verified Dec 22

JSON Schema defines the exact structure and constraints for LLM outputs, ensuring type-safe, validated responses without post-processing guesswork.

JSON Schema is a vocabulary for annotating and validating JSON documents. When applied to LLM outputs, it transforms unpredictable text generation into structured, validated data that your code can safely consume.

The key insight: Instead of parsing free-form text and hoping the LLM followed your instructions, you define the output schema upfront and the LLM is constrained to produce only valid JSON matching that schema.

Why JSON Schema for AI?

Without structured outputs, you get:

  • Unpredictable text formats that require complex parsing
  • Missing fields you expected
  • Type mismatches (strings when you need numbers)
  • Inconsistent naming (sometimes 'email', sometimes 'emailAddress')

With JSON Schema, the LLM's output is guaranteed to match your specification.

person_schema.json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["name", "email", "skills"],
  "properties": {
    "name": {
      "type": "string",
      "description": "Full name of the person"
    },
    "email": {
      "type": "string",
      "format": "email",
      "description": "Valid email address"
    },
    "age": {
      "type": "integer",
      "minimum": 0,
      "maximum": 150,
      "description": "Age in years (optional)"
    },
    "skills": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "minItems": 1,
      "description": "List of professional skills"
    },
    "experience_level": {
      "type": "string",
      "enum": ["junior", "mid", "senior", "staff"],
      "description": "Career level"
    }
  },
  "additionalProperties": false
}
L3: required fields MUST be present in output
L12: format validation ensures valid email
L16: constraints like min/max enforce business rules
L28: enum restricts to specific values
L33: prevents LLM from adding unexpected fields

OpenAI Structured Outputs Example

OpenAI's response_format parameter enforces JSON Schema compliance:

openai_structured_output.ts
import OpenAI from 'openai';

const openai = new OpenAI();

const schema = {
  type: 'object',
  required: ['entities', 'sentiment', 'summary'],
  properties: {
    entities: {
      type: 'array',
      items: {
        type: 'object',
        required: ['text', 'type'],
        properties: {
          text: { type: 'string' },
          type: { 
            type: 'string', 
            enum: ['person', 'organization', 'location', 'date']
          }
        },
        additionalProperties: false
      }
    },
    sentiment: {
      type: 'string',
      enum: ['positive', 'negative', 'neutral']
    },
    summary: { type: 'string' }
  },
  additionalProperties: false
};

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Analyze this email: ...' }
  ],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'email_analysis',
      schema: schema,
      strict: true  // Enforces schema compliance
    }
  }
});

// Output is GUARANTEED to match schema
const result = JSON.parse(response.choices[0].message.content);
console.log(result.entities);  // TypeScript knows this exists
console.log(result.sentiment); // TypeScript knows this is 'positive' | 'negative' | 'neutral'
L43: strict: true enables schema enforcement
L48: No need to validate — schema guarantees structure

Common Schema Patterns

Nested Objects

{
  "type": "object",
  "properties": {
    "address": {
      "type": "object",
      "required": ["city", "country"],
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "country": { "type": "string" }
      }
    }
  }
}

Arrays of Specific Types

{
  "type": "array",
  "items": {
    "type": "object",
    "required": ["id", "name"],
    "properties": {
      "id": { "type": "integer" },
      "name": { "type": "string" }
    }
  },
  "minItems": 1,
  "maxItems": 10
}

Discriminated Unions (oneOf)

{
  "oneOf": [
    {
      "type": "object",
      "required": ["type", "url"],
      "properties": {
        "type": { "const": "link" },
        "url": { "type": "string", "format": "uri" }
      }
    },
    {
      "type": "object",
      "required": ["type", "path"],
      "properties": {
        "type": { "const": "file" },
        "path": { "type": "string" }
      }
    }
  ]
}

When to Use JSON Schema

Use JSON Schema When Consider Alternatives When
You need validated, structured data Free-form text is acceptable
Output feeds directly into your code Output is for human reading
Type safety matters (production apps) Prototyping/experimenting
You need arrays, nested objects, enums Simple key-value extraction
Compliance/audit requires validation Speed is more critical than validation

JSON Schema Best Practices

  1. Start simple: Begin with required fields only, add constraints later
  2. Use descriptions: They guide the LLM on what to extract
  3. Set additionalProperties: false: Prevents unexpected fields
  4. Use enums for categories: Better than free-form strings
  5. Validate business rules: Use minimum, maxLength, pattern, etc.
  6. Version your schemas: Track changes over time
  7. Test with edge cases: Empty arrays, optional fields, null values

Key Takeaways

  • JSON Schema defines structure, types, and constraints for LLM outputs
  • OpenAI's structured outputs enforce schema compliance (no validation needed)
  • Use `required`, `enum`, `additionalProperties: false` for strict typing
  • Descriptions guide the LLM on what to extract
  • JSON Schema is language-agnostic but pairs with Pydantic (Python) and Zod (TypeScript)
  • Not all providers enforce schemas — check documentation for guarantees

In This Platform

This entire platform is built on JSON Schema. Every content type (dimension, question, source, concept, module) has a corresponding schema that validates structure at build time. The schema files serve as both documentation and enforcement.

Relevant Files:
  • schema/survey.schema.json
  • schema/concept.schema.json
  • schema/learning.schema.json
  • schema/exercise.schema.json
  • schema/comparison.schema.json
build.js (excerpt)
// build.js validates all content against schemas
import Ajv from 'ajv';
import addFormats from 'ajv-formats';

const ajv = new Ajv({ allErrors: true });
addFormats(ajv);

// Load schema
const questionSchema = JSON.parse(fs.readFileSync('schema/survey.schema.json'));
const validate = ajv.compile(questionSchema);

// Validate each question
for (const question of questions) {
  const valid = validate(question);
  if (!valid) {
    console.error(`Invalid question ${question.id}:`);
    console.error(validate.errors);
    process.exit(1);
  }
}

// Only valid content makes it to production

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts