Example

This page provides a complete working example of how to use dotjson to constrain an LLM to generate valid JSON.

Depending on your installation method, you may need to adjust the include paths. The example below shows both system-wide and project-local include options, assuming you’ve followed the installation guide.

This example demonstrates:

Creating a Vocabulary for gpt2
Building an Index for a simple JSON schema
Using the LogitsProcessor to mask out invalid tokens during generation
Simulating an LLM’s token generation with random logits
Performing weighted sampling from the valid tokens
Continuing generation until either the EOS token is generated or a maximum token limit is reached

This example is a standalone implementation without an actual LLM backend. It uses random values to simulate token logits. In a real implementation, these would come from your language model’s forward pass.

Example code

Save the following code as example.cpp.

example.cpp

// For system-wide installation
#include <dottxt/dotjson/dotjson.hpp>
#include <dottxt/dotjson/cxx.h>

// // For project-local installation
// #include "dottxt/dotjson/include/dotjson.hpp"
// #include "dottxt/dotjson/include/cxx.h"

#include <iostream>
#include <limits>
#include <vector>
#include <random>
#include <algorithm>

int main() {
  // The vocabulary for gpt2 has 50257 tokens
  std::string model = "gpt2";
  u_int32_t eos_token_id = 50256; // EOS token to check if we have completed sampling

  // The schema is a simple object with one integer property
  std::string schema =
      "{\"type\":\"object\",\"properties\":{\"x\":{\"type\":\"integer\"}}}";

  // Create the vocabulary and index
  dotjson::Vocabulary vocabulary(model);
  dotjson::Index index(schema, vocabulary);

  // The mask value is 0, batch size is 1
  u_int16_t mask_value = 0;
  u_int32_t batch_size = 1;
  u_int32_t vocab_size = vocabulary.max_token_id() + 1;

  // Setup for random number generation, used to simulate language model 
  // logit calculation.
  std::random_device rd;
  std::mt19937 gen(rd());

  // Since we don't have an LLM in this example, we'll be using randomly
  // assigned logit scores between 1-100. Note that 0 is not in this range,
  // as our mask_value is 0.
  std::uniform_int_distribution<u_int16_t> logits_distrib(1, 100); 

  // Create the logits and context, initialized with random numbers
  std::vector<u_int16_t> logits_base(vocab_size);
  for (u_int16_t& logit : logits_base) {
    logit = logits_distrib(gen);
  }
  std::span<u_int16_t> logits_span(logits_base.data(), logits_base.size());
  std::vector<std::span<u_int16_t>> logits = {logits_span};

  // Construct a guide
  dotjson::Guide guide(index, batch_size);

  // Create the set of initial permitted tokens
  dotjson::BatchedTokenSet allowed_tokens = guide.get_start_tokensets();
  
  // Construct the processor, a function that masks out invalid tokens
  dotjson::LogitsProcessor processor(guide, mask_value);

  // Initialize the sequence of prior generated tokens
  std::vector<std::vector<u_int32_t>> context(batch_size);

  //
  // The following loop will:
  //
  // 1. Randomly assign logits to all tokens (to simulate an LLM forward pass)
  // 2. Mask out tokens inconsistent with the schema by setting their values
  //    to mask_value (0 in this case)
  // 3. Select a token randomly, proportional to its logit
  // 4. Add that token to the context
  // 5. Check if the new token is the EOS token -- if so, stop sampling.
  //

  // Set a maximum number of tokens to sample
  const int max_tokens_to_sample = 100;

  // Begin sampling loop
  for (int i = 0; i < max_tokens_to_sample; ++i) {
    // Reset logits_base with new random values before processing
    for (u_int16_t& logit : logits_base) {
      logit = logits_distrib(gen);
    }

    // Mask out any tokens that are inconsistent with the schema
    processor(logits, allowed_tokens);

    // Sample proportionally from the logits vector using std::discrete_distribution.
    std::discrete_distribution<> distrib(logits_base.begin(), logits_base.end());
    u_int32_t sampled_token_id = distrib(gen);

    // Add the sampled token to the context for the current batch entry (0)
    context[0].push_back(sampled_token_id);

    // Update the set of allowed tokens for the next iteration
    allowed_tokens = guide.get_next_tokensets({sampled_token_id});

    // Check if the sampled token is the EOS token, if so, exit.
    if (sampled_token_id == eos_token_id) {
      std::cout << "EOS token (" << eos_token_id << ") sampled. Stopping generation." << std::endl;
      break;
    }
  }

  std::cout << "Final tokens: ";
  for(const auto& token : context[0]) {
    std::cout << token << " ";
  }
  std::cout << std::endl;
}

Compilation

To compile the example, run the following command:

g++ -o example example.cpp -ldotjson -ldotjsoncpp -Wall -Wextra -std=c++20 -O3

g++ -o yourprogram yourprogram.cpp -I. -L./dottxt/dotjson/lib -ldotjson -ldotjsoncpp -Wall -Wextra -std=c++20 -O3 -Wl,-rpath,\$ORIGIN/dottxt/dotjson/lib

Execution

To execute the example, run the following command:

./example

You should see output similar to the following:

EOS token (50256) sampled. Stopping generation.
Final tokens: 90 197 628 198 92 628 197 198 201 197 201 628 628 197 198 197 198 50256