JSON Schema Features

dotjson expects a JSON Schema to enforce language model output. This guide explains supported features and provides practical examples.

dotjson supports most features from JSON Schema specification version 2020-12. For an introduction to JSON Schema, visit the official documentation.

dotjson does not impose limits on the number of properties or depth of nested objects. dotjson supports a variety of rich JSON schema features such as recursive schemas, inline regular expressions, array and string length constraints, and more.

Tip

Throughout this guide, examples are shown with both the schema definition and a valid example object that conforms to the schema.

Basic schema example

A JSON schema looks like this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "first_name": {"type": "string"},
    "age": {"type": "integer"},
    "height": {"type": "number"},
    "is_customer": {"type": "boolean"}
  },
  "required": ["first_name", "age", "height", "is_customer"]
}

dotjson disables tokens inconsistent with the structure of a JSON object, or that would not validate against the given schema. A language model constrained by dotjson might output a result like this:

{
  "first_name": "John",
  "age": 37,
  "height": 1.80,
  "is_customer": true
}

Note

The $schema is not technically required, but it is considered good practice to include it for validation and interoperability purposes.

Core JSON data types

The type of a value in dotjson is specified using the type keyword.

For example, string values are defined as:

{"type": "string"}

This section provides an overview of valid entries for the type keyword.

String

Strings represent text data. dotjson supports the following string constraints:

minLength and maxLength, specifiying the minimum and maximum number of characters in the text.
pattern, a regular expression defining the text the language model must generate.
const, specifying that the value must be exactly a pre-specified value. const is useful when you do not want the language model to generate a value in your schema, such as for identifiers, names, etc.
enum, which requires a field to be one of several pre-specified options.
format, a commonly-used format for various types of text data, such as email, ipv4 addresses, or UUIDs.

Note

dotjson does not support multiple constraints of different types. For example, minLength and maxLength cannot be used with format.

String length constraints

Strings may be constrained to have minimum and/or maximum character lengths using the minLength and maxLength characters.

Using minLength:

minLength

{
  "type": "object",
  "properties": {
    "password": {
      "type": "string",
      "minLength": 8,
      "description": "Password must be at least 8 characters"
    }
  },
  "required": ["password"]
}

{
  "password": "secureP@ssw0rd"
}

Specifying a maximum number of characters with maxLength:

maxLength

{
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "maxLength": 20,
      "description": "Username cannot exceed 20 characters."
    }
  },
  "required": ["username"]
}

{
  "username": "kilroy"
}

Note

String length constraints can increase compilation times, though they do not significantly impact runtime performance.

Regular expressions with `pattern`

JSON strings can be constrained to follow a particular format determined by a regular expression through the use of the pattern keyword. dotjson supports most standard regular expression features, though please see unsupported features for details on unsupported features.

Here is a simple example of pattern usage to match a five-digit US zip code, or the extended nine-digit zip code:

Regular expressions

{
  "type": "object",
  "properties": {
    "zipCode": {
      "type": "string",
      "pattern": "[0-9]{5}(-[0-9]{4})?",
      "description": "US ZIP code in 5-digit or 5+4 format"
    }
  },
  "required": ["zipCode"]
}

{
  "zipCode": "12345"
}

{
  "zipCode": "12345-6789"
}

Common text formats with `format`

The format keyword allows you to specify that a string field must conform to a predefined, commonly used format. This is useful for ensuring that a model’s output text adheres to a standardized format, such as email, URI, UUID, dates, etc.

Supported format values are:

email - Email addresses (e.g., [email protected])
hostname - Domain names (e.g., example.com)
ipv4 - IPv4 addresses (e.g., 192.168.1.1)
uri - URI addresses (e.g., 2001:db8:85a3::8a2e:370:7334)
uri-reference - URI references (e.g., https://example.com/resource?param=value)
uuid - UUIDs (e.g., 123e4567-e89b-12d3-a456-426614174000)
date - ISO8601 dates (e.g., 2023-04-15)
time - ISO8601 times (e.g., 14:30:15Z)
date-time - ISO8601 date-times (e.g., 2023-04-15T14:30:15Z)
duration - ISO8601 durations (e.g., P1Y2M3DT4H5M6S)

Applying format arguments is useful when model output must be used in standard applications. Using the uri format guarantees that the model’s output is parsable by any web tool, though as with any language model the semantic correctness of the URI cannot be guaranteed.

The following schema demonstrates the use of all format types available.

Available formats

{
  "type": "object",
  "properties": {
    "email": {
      "type": "string",
      "format": "email",
      "description": "Email address"
    },
    "hostname": {
      "type": "string",
      "format": "hostname",
      "description": "Domain name"
    },
    "ipv4": {
      "type": "string",
      "format": "ipv4",
      "description": "IPv4 address"
    },
    "uri": {
      "type": "string",
      "format": "uri",
      "description": "URI address"
    },
    "uri_reference": {
      "type": "string",
      "format": "uri-reference",
      "description": "URI reference"
    },
    "uuid": {
      "type": "string",
      "format": "uuid",
      "description": "Universally Unique Identifier"
    },
    "date": {
      "type": "string",
      "format": "date",
      "description": "ISO8601 date"
    },
    "time": {
      "type": "string",
      "format": "time",
      "description": "ISO8601 time"
    },
    "date_time": {
      "type": "string",
      "format": "date-time",
      "description": "ISO8601 date-time"
    },
    "duration": {
      "type": "string",
      "format": "duration",
      "description": "ISO8601 duration"
    }
  },
  "required": ["email", "hostname", "ipv4", "uri", "uri_reference", "uuid", "date", "time", "date_time", "duration"]
}

Available formats

{
  "email": "[email protected]",
  "hostname": "example.com",
  "ipv4": "192.168.1.1",
  "uri": "https://example.com/resource?param=value",
  "uri_reference": "https://example.com/resource?param=value",
  "uuid": "123e4567-e89b-12d3-a456-426614174000",
  "date": "2023-04-15",
  "time": "14:30:15Z",
  "date_time": "2023-04-15T14:30:15Z",
  "duration": "P1Y2M3DT4H5M6S"
}

Number & integer

Numeric values use types integer for integer values, and number for any numeric value. number also includes integers.

A common use case for numeric types is extracting information from webpages, transcripts, or images.

Integer and Number

{
  "type": "object",
  "properties": {
    "product_id": {
      "type": "integer",
      "description": "Product identifier"
    },
    "price": {
      "type": "number",
      "description": "Current product price"
    },
    "name": {
      "type": "string"
    }
  },
  "required": ["product_id", "price", "name"]
}

Integer and Number

{
  "product_id": 1235,
  "price": 79.99,
  "name": "Wireless Headphones"
}

Boolean

booleans are true or false values and are useful for simple binary classification tasks.

Let’s start with a simple complaint classifier that identifies whether a customer message is a complaint:

Boolean

{
  "type": "object",
  "properties": {
    "is_complaint": {
      "type": "boolean"
    }
  },
  "required": [
    "is_complaint"
  ]
}

Boolean

{
  "is_complaint": false
}

This simple schema allows a language model to classify messages as complaints or not. In a real application, you might want to extend this with additional fields, as we’ll see in later examples.

Null

The null type explicitly defines a field that must have a null value. Setting fields to null explicitly is not a common practice on its own, but null is frequently used to define optional values in combination with other JSON schema features, such as anyOf.

Null

{
  "type": "object",
  "properties": {
    "always_null": {
      "type": "null"
    }
  },
  "required": ["always_null"]
}

Null

{
  "always_null": null
}

null becomes more useful when combined with other types to create optional fields or to represent the absence of a value.

Let’s extend our earlier complaint classifier to make it more practical. In addition to determining whether a message is a complaint, we’ll add an optional response field for customer service representatives to use:

Extended Complaint Processor

{
  "type": "object",
  "properties": {
    "is_complaint": {
      "type": "boolean",
      "description": "Whether the customer message is classified as a complaint"
    },
    "response_to_complaint": {
      "anyOf": [
        {"type": "string"},
        {"type": "null"}
      ],
      "default": null,
      "description": "Optional response text for customer service to use if this is a complaint"
    }
  },
  "required": [
    "is_complaint"
  ]
}

Extended Complaint Processor

{
  "is_complaint": true,
  "response_to_complaint": "I'm sorry to hear that you're having trouble with the Super Happy Fun Ball! Let's get you a refund right away."
}

Or for a non-complaint:

{
  "is_complaint": false,
  "response_to_complaint": null
}

Note

This example doesn’t implement conditional validation (which is not fully supported in dotjson). A language model might generate inconsistent values - for example, setting is_complaint to false but still providing a response. Your application should handle these potential inconsistencies, as in this python code:

if data["is_complaint"] and data["response_to_complaint"] is not None:
    print(f"Response to complaint: {data['response_to_complaint']}")
elif data["is_complaint"]:
    print("This is a complaint that needs a response")
else:
    print("Not a complaint")

Arrays

Arrays are used for collections of elements. Arrays can be specified in two ways:

The items keyword that defines a subschema shared by all elements in the array.
The prefixItems keyword, which specifies the subschema for the first N elements in the array.

Homogeneous arrays are specified using

{ "type": "array", "items": { "type": "number" } }

dotjson supports:

Item validation using the items keyword for homogeneous arrays.
Tuple validation with prefixItems for ordered, mixed-type arrays.
Array length enforcement using the minItems and maxItems keywords.

Specifying element type with `items`

Item validation is used to ensure that all elements in an array are of the same type, using the keyword items.

All items in an item-validated array must be of the same type, i.e. [1,2,3]. Mixed types like [1, '2', true] are disallowed.

This schema requires the model to produce only an array of numbers:

Number array

{
  "type": "object",
  "properties": {
    "some_numbers": {
      "type": "array",
      "items": {
        "type": "number"
      }
    }
  },
  "required": ["some_numbers"]
}

Number array

{"some_numbers":[1,2,3,4,5]}

Here’s a more practical example schema designed to generate hashtags for a piece of text content. Note that pattern is assigned the regular expression #[a-z]+. All elements of the hashtags array must begin with a # and use only lowercase letters.

String array

{
  "type": "object",
  "properties": {
    "hashtags": {
      "type": "array",
      "items": {
        "type": "string",
        "pattern": "#[a-z]+"
      }
    }
  },
  "required": ["hashtags"]
}

String array

{
  "hashtags":["#gardening", "#plants", "#zombies"]
}

You can specify any valid JSON schema as the value of items. Here’s an example of an array containing objects with contact information:

Array of objects

{
  "type": "object",
  "properties": {
    "contacts": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "email": { "type": "string", "format": "email" },
          "phone": { "type": "string" }
        },
        "required": ["name", "email"]
      },
      "description": "List of contact information"
    }
  },
  "required": ["contacts"]
}

Array of objects

{
  "contacts": [
    {
      "name": "Jane Smith",
      "email": "[email protected]",
      "phone": "555-1234"
    },
    {
      "name": "John Doe",
      "email": "[email protected]"
    }
  ]
}

Array length constraints

Control the number of items in an array using minItems and maxItems:

minItems: Minimum number of elements required in the array
maxItems: Maximum number of elements allowed in the array

Array length constraints

{
  "type": "object",
  "properties": {
    "topFiveMovies": {
      "type": "array",
      "items": { "type": "string" },
      "minItems": 1,
      "maxItems": 5,
      "description": "User's top 1-5 favorite movies"
    }
  },
  "required": ["topFiveMovies"]
}

Array length constraints

{
  "topFiveMovies": ["The Matrix", "Inception", "Interstellar"]
}

Object

Required properties

JSON object properties are not required by default, but can be made required by including the property name in the required array. If a property is not included in required, the model will choose whether or not to include it in the final output.

In this example, the model must produce a name and email, but can choose whether to include a phone number:

Required properties

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "email": { "type": "string" },
    "phone": { "type": "string" }
  },
  "required": ["name", "email"]
}

Required properties

{
  "name": "Jane Smith",
  "email": "[email protected]"
}

{
  "name": "Jane Smith",
  "email": "[email protected]",
  "phone": "(123) 456-7890"
}

Note

Property validation with additionalProperties

The additionalProperties keyword defaults to false, meaning language models cannot generate properties that are not explicitly defined in the schema.

additionalProperties: true is not currently supported, and will result in an error.

For nested objects, you can specify required properties at each level independently. The required array applies only to properties at the same level where it is defined.

This example shows a user profile schema with contact information as a nested object. In the main object, username and contact are required, while birthdate is optional. Within the nested contact object, email is required, while phone and address are optional:

Nested required properties

{
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "description": "User's login name"
    },
    "birthdate": {
      "type": "string",
      "format": "date",
      "description": "User's date of birth"
    },
    "contact": {
      "type": "object",
      "properties": {
        "email": {
          "type": "string",
          "format": "email",
          "description": "Primary contact email"
        },
        "phone": {
          "type": "string",
          "description": "Contact phone number"
        },
        "address": {
          "type": "object",
          "properties": {
            "street": { "type": "string" },
            "city": { "type": "string" },
            "country": { "type": "string" }
          },
          "required": ["street", "city"],
          "description": "Physical address (street and city required if address is provided)"
        }
      },
      "required": ["email"],
      "description": "Contact information (email required)"
    }
  },
  "required": ["username", "contact"]
}

Nested required properties

{
  "username": "jsmith2024",
  "contact": {
    "email": "[email protected]",
    "address": {
      "street": "123 Main Street",
      "city": "Springfield"
      // country is optional
    }
    // phone is optional
  }
  // birthdate is optional
}

With optional fields included:

{
  "username": "jsmith2024",
  "birthdate": "1990-05-15",
  "contact": {
    "email": "[email protected]",
    "phone": "555-123-4567",
    "address": {
      "street": "123 Main Street",
      "city": "Springfield",
      "country": "United States"
    }
  }
}

In this example:

At the top level, only username and contact must be included
Within the contact object, only email is required
If an address is provided, it must include both street and city properties

Enumerated values with `enum`

The enum keyword restricts a value to a predefined set of options. It can be used with any JSON data type – strings, numbers, booleans, or even objects and arrays. When enum is used, the model can only generate one of the specified values.

This is particularly useful for:

Forcing the model to choose from a specific set of categories
Ensuring standardized responses
Creating controlled vocabularies
Building classification or tagging systems

Here’s a simple example that forces the model to classify a product into one of three categories:

String enum

{
  "type": "object",
  "properties": {
    "product_category": {
      "type": "string",
      "enum": ["electronics", "clothing", "home_goods"],
      "description": "The category this product belongs to"
    },
    "product_name": {
      "type": "string",
      "description": "Name of the product"
    }
  },
  "required": ["product_category", "product_name"]
}

{
  "product_category": "electronics",
  "product_name": "Wireless Headphones"
}

You can also use enum with numeric values:

Numeric enum

{
  "type": "object",
  "properties": {
    "priority": {
      "type": "integer",
      "enum": [1, 2, 3, 5, 8],
      "description": "Task priority using Fibonacci sequence"
    },
    "task": {
      "type": "string",
      "description": "Description of the task"
    }
  },
  "required": ["priority", "task"]
}

{
  "priority": 3,
  "task": "Update user documentation"
}

Practical enum applications

Enums are particularly useful for building structured classification systems. This example demonstrates a more complex feedback categorization system:

Feedback categorization

{
  "type": "object",
  "properties": {
    "feedback_type": {
      "type": "string",
      "enum": ["bug_report", "feature_request", "compliment", "complaint", "question"],
      "description": "The category of user feedback"
    },
    "severity": {
      "type": "string",
      "enum": ["critical", "high", "medium", "low"],
      "description": "How severe or important the feedback is"
    },
    "product_area": {
      "type": "string",
      "enum": ["ui", "performance", "security", "documentation", "billing", "other"],
      "description": "The area of the product this feedback relates to"
    },
    "message": {
      "type": "string",
      "description": "The actual feedback message from the user"
    },
    "suggested_action": {
      "type": "string",
      "description": "Suggested next steps based on the feedback"
    }
  },
  "required": ["feedback_type", "severity", "product_area", "message", "suggested_action"]
}

{
  "feedback_type": "bug_report",
  "severity": "high",
  "product_area": "performance",
  "message": "The application becomes extremely slow after uploading more than 5 images at once.",
  "suggested_action": "Investigate image processing queue and implement batch processing with progress indicators."
}

Tip

Communicate available enum options to the model when using enums. You can inform the model directly by providing a list of options, or by describing the options in such a way that the model can infer the options.

Constant values with `const`

The const keyword allows you to specify an exact value that must be generated. When const is used, the model can only generate tokens consistent with the specified value. This applies to any JSON data type - strings, numbers, booleans, objects, or arrays.

const is particularly useful when:

You don’t want the model to freely generate certain values (like IDs, usernames, or other fixed data)
You want to simplify your data model by showing the model all your data and asking it to fill in only specific fields
You want to provide context to the model without including it in the prompt

String constants

Here’s an example where we use const to pin a UUID and genre to known values, and allow the model to generate a story:

String constants

{
  "type": "object",
  "properties": {
    "id": {
      "type": "string",
      "const": "e29d2712-8e93-4486-b8a9-d99e84f3dd6b"
    },
    "genre":{
      "type":"string",
      "const": "science fiction"
    },
    "story": {
      "type": "string",
      "minLength": 20,
      "description": "A short story generated by the language model."
    }
  },
  "required": ["id", "story", "genre"]
}

String constants

{
  "id": "e29d2712-8e93-4486-b8a9-d99e84f3dd6b",
  "genre": "science fiction",
  "story": "Once upon a time, there was a very depressed robot named Marvin."
}

Tip

In this example, setting "genre": "science fiction" as a constant provides thematic guidance to the model. This is a powerful technique because:

It ensures consistent metadata in your output (the ID and genre never change)
It allows you to guide the content generation without putting those instructions in your prompt
The model recognizes the genre constraint and produces content that fits that theme
You can easily change the genre constant to get different themed content without changing your prompt

Object constants

The const keyword can also be applied to objects and arrays. This is useful when you want to fix certain structural elements while allowing the model to generate others.

Here’s an example of a product configuration where the available options are fixed, but the user preferences are generated by the model:

Object constants

{
  "type": "object",
  "properties": {
    "product_info": {
      "type": "object",
      "const": {
        "name": "Smart Speaker",
        "model": "Echo-2023",
        "available_colors": ["black", "white", "blue"],
        "available_features": ["voice_control", "music_streaming", "smart_home"]
      }
    },
    "model_selection": {
      "type": "object",
      "properties": {
        "color": {
          "type": "string",
          "enum": ["black", "white", "blue"],
          "description": "Model's color preference"
        },
        "selected_features": {
          "type": "array",
          "items": {
            "type": "string",
            "enum": ["voice_control", "music_streaming", "smart_home"]
          },
          "description": "Features the model wants to enable"
        },
        "quantity": {
          "type": "integer",
          "minimum": 1,
          "description": "Number of units to purchase"
        }
      },
      "required": ["color", "selected_features", "quantity"]
    }
  },
  "required": ["product_info", "model_selection"]
}

Object constants

{
  "product_info": {
    "name": "Smart Speaker",
    "model": "Echo-2023",
    "available_colors": ["black", "white", "blue"],
    "available_features": ["voice_control", "music_streaming", "smart_home"]
  },
  "model_selection": {
    "color": "blue",
    "selected_features": ["voice_control", "smart_home"],
    "quantity": 2
  }
}

In this example, the product_info object is entirely fixed with const, while the model generates the model_selection object based on the schema constraints. This pattern is useful for scenarios where you want to provide the model with fixed reference information while it generates related content.

Schema combinations

References and schema reuse with `$defs`

References allow you to define a schema once and reuse it multiple times, which is particularly valuable for large, complex schemas.

References allow you to define common structures once and reference them throughout your schema. References have the advantage of

Improved readability
Consistency across the schema
Modular design

The $defs keyword at the root level of the schema creates a library of reusable schema definitions, while the $ref keyword references those definitions in the primary schema.

dotjson only supports references to definitions within the same schema, not external references such as file:// or http://. The reference # references the root of the schema, and subschemas can be referenced using #/$defs/<name>.

Here’s a minimal example of reference syntax:

{
  "type": "object",
  "properties": {
    "user": { "$ref": "#/$defs/person" }
  },
  "required": ["user"],
  "$defs": {
    "person": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "age": { "type": "integer" }
      }
    }
  }
}

Here’s a more practical example that demonstrates schema reuse. We’ll define an address structure once and reference it for both billing and shipping addresses, avoiding redundant definitions:

Address references

{
  "type": "object",
  "properties": {
    "billing_address": { "$ref": "#/$defs/address" },
    "shipping_address": { "$ref": "#/$defs/address" }
  },
  "$defs": {
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "zip": { "type": "string" }
      },
      "required": ["street", "city", "state", "zip"]
    }
  },
  "required": ["billing_address"]
}

Here is an example of a schema that should use references but does not. Note the repeated definition of the same schema.

Address references

{
  "type": "object",
  "properties": {
    "billing_address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "zip": { "type": "string" }
      },
      "required": ["street", "city", "state", "zip"]
    },
    "shipping_address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "zip": { "type": "string" }
      },
      "required": ["street", "city", "state", "zip"]
    }
  },
  "required": ["billing_address"]
}

Address references

{
  "billing_address": {
    "street": "123 Main St",
    "city": "Anytown",
    "state": "CA",
    "zip": "12345"
  },
  "shipping_address": {
    "street": "456 Market St",
    "city": "Somewhere",
    "state": "NY",
    "zip": "67890"
  }
}

Without references, you would need to duplicate the entire address structure definition, making the schema harder to maintain. If you later needed to add a field like country to all addresses, you would only need to update it in the snippet definition.

References can also be nested to create more complex hierarchical structures. Here, we define both a person schema and an address schema, with person referencing address:

Nested references

{
  "type": "object",
  "properties": {
    "people": {
      "type": "array",
      "items": { "$ref": "#/$defs/person" }
    }
  },
  "required": ["people"],
  "$defs": {
    "person": {
      "type": "object",
      "properties": {
        "firstName": { "type": "string" },
        "lastName": { "type": "string" },
        "address": { "$ref": "#/$defs/address" }
      },
      "required": ["firstName", "lastName", "address"]
    },
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "country": { "type": "string" }
      },
      "required": ["street", "city", "country"]
    }
  }
}

Nested references

{
  "people": [
    {
      "firstName": "Elizabeth",
      "lastName": "Hodges",
      "address": {
        "street": "1479 Harbor Oaks Drive",
        "city": "Los Angeles",
        "country": "United States"
      }
    },
    {
      "firstName": "Henry",
      "lastName": "Wicket",
      "address": {
        "street": "4628 Summerfield Place",
        "city": "Fargo",
        "country": "United States"
      }
    }
  ]
}

Recursive schemas

dotjson supports recursive, self-referencing schemas of unlimited depth. See the JSON Schema guide for more information on recursion.

A simple example is an object with a string name field and a children field containing an array of objects, also with name and children fields. We denote the recursion with "items": { "$ref": "#" }, which means that all items in the children array must follow the same schema as the root schema.

Recursive schema

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "children": {
      "type": "array",
      "items": { "$ref": "#" }
    }
  },
  "required": ["name"]
}

Recursive schema

{
  "name": "Root",
  "children": [
    {
      "name": "Child 1",
      "children": [
        {
          "name": "Grandchild 1",
          "children": []
        }
      ]
    },
    {
      "name": "Child 2"
    }
  ]
}

Recursive schemas are useful for using language models to generate hierarchical structures. This can include modeling/extracting organization charts, product taxonomies, navigation menus, knowledge graphs, etc.

The following example demonstrates a recursive schema for modeling knowledge graphs of arbitrary size and depth. Each node in the graph represents a concept with a name, definition, and list of related concepts. The recursive nature of the schema means that each related concept follows the same structure, allowing for a rich network of interconnected concepts and definitions.

Simple knowledge graph schema

{
  "type": "object",
  "properties": {
    "concept": {
      "type": "string",
      "description": "The main concept or topic"
    },
    "definition": {
      "type": "string",
      "description": "Brief definition of the concept"
    },
    "related_concepts": {
      "type": "array",
      "items": {
        "$ref": "#"
      },
      "description": "Related sub-concepts that help explain the main concept"
    }
  },
  "required": ["concept", "definition"]
}

Knowledge graph example

{
  "concept": "Artificial Intelligence (AI)",
  "definition": "AI is the simulation of human intelligence in machines that are programmed to think and learn
like humans do, and to perform tasks that typically require human intelligence to complete.",
  "related_concepts": [
    {
      "concept": "Machine Learning (ML)",
      "definition": "A subset of AI that involves the development of algorithms and statistical models that enable
computers to perform a specific task without explicit instructions.",
      "related_concepts": [
        {
          "concept": "Supervised Learning",
          "definition": "A type of machine learning where a model is trained on labeled data to learn and predict
outcomes."
        },
        {
          "concept": "Unsupervised Learning",
          "definition": "A type of machine learning where a model is trained on data without labels to identify
patterns and groupings within the data."
        }
      ]
    },
    {
      "concept": "Knowledge Representation",
      "definition": "The encoding of knowledge in a form that can be easily understood and processed by AI.",
      "related_concepts": [
        {
          "concept": "Semantic Networks",
          "definition": "Graphs used to represent knowledge using nodes and edges to show the relationships
between concepts."
        },
        {
          "concept": "Ontologies",
          "definition": "Formal representations of knowledge describing a set of concepts and the relationships
between them."
        }
      ]
    }
  ]
}

Note

Complex recursive schemas can be slow to compile due to their complexity. You may wish to make sure you cache indices to amortize compilation costs.

Choosing a subschema with `anyOf`

anyOf allows the model to choose from one of the available subschemas. anyOf is useful when you want to define several “paths” for the model to respond in.

Here is an example of a schema that requires a language model to respond as if it were a webserver. It will return a status code (400, 200, 404, etc.) Schemas like this are useful when the model is able to manage an applications internals and send messages directly back to the caller.

{
"type": "object",
"properties": {
  "response": {
    "anyOf": [
      {
        "type": "object",
        "properties": {
          "success": { "type": "boolean", "const": true },
          "status": { "type": "integer", "enum": [200] },
          "data": {
            "type": "object",
            "properties": {
              "id": { "type": "string" },
              "name": { "type": "string" },
              "timestamp": { "type": "string" }
            },
            "required": ["id", "name", "timestamp"]
          }
        },
        "required": ["success", "status", "data"]
      },
      {
        "type": "object",
        "properties": {
          "success": { "type": "boolean", "const": false },
          "status": { "type": "integer", "enum": [400, 401, 403, 404, 500] },
          "error": {
            "type": "object",
            "properties": {
              "code": { "type": "string" },
              "message": { "type": "string" },
              "details": { "type": "string" }
            },
            "required": ["code", "message"]
          }
        },
        "required": ["success", "error"]
      },
      {
        "type": "object",
        "properties": {
          "success": { "type": "boolean", "const": false },
          "status": { "type": "integer", "enum": [301, 302] },
          "redirect": {
            "type": "object",
            "properties": {
              "url": { "type": "string" },
              "temporary": { "type": "boolean" }
            },
            "required": ["url", "temporary"]
          }
        },
        "required": ["success", "redirect"]
      }
    ]
  }
},
"required": ["status", "response"]
}

Successful response:

{
  "response": {
    "success": true,
    "status": 200,
    "data": {
      "id": "user_12345",
      "name": "John Doe",
      "timestamp": "2025-03-20T14:32:10Z"
    }
  }
}

Code 400 response:

{
  "response": {
    "success": false,
    "status": 400,
    "error": {
      "code": "INVALID_INPUT",
      "message": "The provided email address is invalid",
      "details": "Email must be in the format [email protected]"
    }
  }
}

Redirect response:

{
  "response": {
    "success": false,
    "status": 302,
    "redirect": {
      "url": "https://api.example.com/v2/resources",
      "temporary": true
    }
  }
}

Real-world examples

Sentiment analysis

{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "neutral", "negative"],
      "description": "The classified sentiment category"
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1,
      "description": "Confidence score of the sentiment classification"
    },
    "analysis": {
      "type": "string",
      "description": "Detailed analysis explaining the sentiment classification"
    }
  },
  "required": ["sentiment", "confidence", "analysis"]
}

Sentiment analysis

{
  "sentiment": "positive",
  "confidence": 0.87,
  "analysis": "The text contains multiple positive expressions and enthusiastic language, with no significant negative elements."
}

Tip

For language model applications, consider using a combination of enum for categorizable outputs and free-form strings for explanations and analysis.

Unsupported features

The following JSON Schema features are not fully supported in dotjson:

Conditional validation (if-then-else), dependent properties and schemas
patternProperties
Numeric constraints (e.g. multipleOf, maxmimum, minimum, exclusiveMaximum, exclusiveMinimum)
Unique items in arrays
multipleOf schema
oneOf, allOf and not

Certain regular expression patterns are not supported:

\b for word boundaries
Backwards references, e.g. \1 or (?P=open)
Conditional matches(?(1)a|b)
Lookbacks(?=bar)
Lookaheadsfoo(?=bar)
Lookbehinds(?<=foo)bar
Atomic groups(?>pattern)
Recursion(?R) or (?1)
Non-capturing groups(?:pattern)
Named captures(?P<name>pattern)
Inline modifiers(?i)case-insensitive
Subroutines\g<1>
Branch resets(?|pattern1|pattern2)
Inline comments(?#comment)
Code callouts(*MARK:name)
Version checks(*VERSION)
Whitespace-insensitive patterns, e.g. (?x)pattern # comment

Warning

Using these unsupported features may result in unexpected behavior or validation errors.

Next steps

Usage

How to use the dotjson C++ API

Troubleshooting

Common issues and their solutions

API Reference

Detailed API documentation

JSON Schema Features

Basic schema example

Core JSON data types

String

String length constraints

Regular expressions with pattern

Common text formats with format

Number & integer

Boolean

Null

Arrays

Specifying element type with items

Array length constraints

Object

Required properties

Enumerated values with enum

Practical enum applications

Constant values with const

String constants

Object constants

Schema combinations

References and schema reuse with $defs

Recursive schemas

Choosing a subschema with anyOf

Real-world examples

Sentiment analysis

Unsupported features

Next steps

Regular expressions with `pattern`

Common text formats with `format`

Specifying element type with `items`

Enumerated values with `enum`

Constant values with `const`

References and schema reuse with `$defs`

Choosing a subschema with `anyOf`