dotjson v0.0.9

dotjson version 0.0.9 introduced breaking interface changes from from 0.0.5. The primary changes are:

The introduction of the Guide class, which separates the logic of generating sets of allowed tokens from the logic of processing logits.
The LogitsProcessor constructor now takes a Guide object directly rather than separate parameters.

Previously, the DotJsonProcessor class (now called LogitsProcessor) generated sets of allowed tokens and processed logits at runtime. The Guide interface disentangles these responsibilities, allowing for more flexibility in how sets of allowed tokens are generated.

For example, allowed tokens can be calculated in parallel while the language model is performing a forward pass. Once the forward pass is complete, the sets of allowed tokens can be processed in a separate step.

v0.0.9 also introduces a few convenience methods.

Change log

Overview of changes:

DotJsonProcessor is now LogitsProcessor
A new Guide class is used to generate sets of allowed tokens. Guides are created with Guide(index, batch_size)
The LogitsProcessor constructor now accepts the arguments:
- guide: The Guide object used to produce the token sets
- mask_value: The equivalent of -std::numeric_limits<float>::infinity()
Guide has two methods, both returning a BatchedTokenSet:
- get_start_tokensets() returns a set of allowed tokens for the start of a sequence
- get_next_tokensets(last_token_ids) returns a set of allowed tokens for the next token, given the last token
The new BatchedTokenSet class provides utility methods to inspect token sets:
- contains(token_id) returns a vector of booleans indicating if the token is allowed in each batch, which can be used to stop early if the EOS token is available in the set of allowed tokens
- num_allowed() returns a vector with the number of allowed tokens in each batch
The Vocabulary class now supports two convenience methods:
- vocabulary.max_token_id() returns the maximum token id – the size of the logits array to be allocated/processed is max_token_id() + 1
- vocabulary.eos_token_id() returns the end-of-sequence token id, which can be used to determine when to stop generating a sequence

Detailed migration notes

Generating sets of allowed tokens

dotjson v0.0.5 bundled the logic of generating sets of allowed tokens with the logic of processing logits. The DotJsonProcessor class (now called LogitsProcessor) was used to both generate sets of allowed tokens and process logits.

In v0.0.9, the Guide class is used to generate sets of allowed tokens. The Guide class is created with Guide(index, batch_size).

Guide objects have two methods:

get_start_tokensets() returns a set of allowed tokens for the start of a sequence
get_next_tokensets(last_token_ids) returns a set of allowed tokens for the next token, given the last token

Here’s how to create a Guide object from an Index object and a batch size:

// Create the guide
dotjson::Guide guide(index, batch_size);

To generate the initial set of allowed tokens, call get_start_tokensets():

// Generate the initial set of allowed tokens
dotjson::BatchedTokenSet allowed_tokens = guide.get_start_tokensets();

To generate the set of allowed tokens for every step after the first, call get_next_tokensets(last_token_ids) where last_token_ids is a vector of the last batch_size tokens generated by the language model.

// Generate the set of allowed tokens for the next token
dotjson::BatchedTokenSet allowed_tokens = guide.get_next_tokensets(last_token_ids);

Using the new `LogitsProcessor`

Here is the deprecated constructor from v0.0.5:

dotjson::DotJsonProcessor processor(index, batch_size, mask_value);

Here is the new constructor from v0.0.9:

// Create the guide
dotjson::Guide guide(index, batch_size);

// Create the processor using the guide
dotjson::LogitsProcessor processor(guide, mask_value);

When invoking the processor in v0.0.5, the method behaved as follows:

processor(logits, context);

In v0.0.9, the method is now to use a BatchedTokenSet object in the second argument, rather than the history of generated tokens:

processor(logits, token_set);

where token_set is a BatchedTokenSet object generated by the Guide object using the Guide::get_next_tokensets() or Guide::get_start_tokensets() methods.

Complete Example

// Create vocabulary and index
std::string model = "gpt2";
std::string schema = "{\"type\":\"object\",\"properties\":{\"x\":{\"type\":\"integer\"}}}";
dotjson::Vocabulary vocabulary(model);
dotjson::Index index(schema, vocabulary);

// Create guide and processor
std::size_t batch_size = 1; 
u_int16_t mask_value = 0; // Appropriate mask value

// Create the guide and processor
dotjson::Guide guide(index, batch_size);
dotjson::LogitsProcessor processor(guide, mask_value);

// Get initial token set
dotjson::BatchedTokenSet token_set = guide.get_start_tokensets();

// Process logits and generate tokens
std::vector<std::span<uint16_t>> logits;
// Populate logits...
processor(logits, token_set);
std::vector<u_int32_t> sampled_tokens = sample_tokens(logits);

// Get the next token set and continue
token_set = guide.get_next_tokensets(sampled_tokens);

Allocating logits using `vocabulary.max_token_id()`

In the event that you are not tracking the vocabulary size, you can use vocabulary.max_token_id() to allocate a logits array. Token IDs are 0-indexed, so the size of the logits array required is vocabulary.max_token_id() + 1.

Allocate logits using vocabulary.max_token_id() + 1:

// Create the vocabulary and index
dotjson::Vocabulary vocabulary(model);
dotjson::Index index(schema, vocabulary);

// The mask value is 0, batch size is 1
u_int16_t mask_value = 0;
u_int32_t batch_size = 1;
u_int32_t vocab_size = vocabulary.max_token_id() + 1;

// Create the logits vector
std::vector<u_int16_t> logits(vocab_size);