dotjson v0.0.9
dotjson
version 0.0.9 introduced breaking interface changes from from 0.0.5. The primary changes are:
- The introduction of the
Guide
class, which separates the logic of generating sets of allowed tokens from the logic of processing logits. - The
LogitsProcessor
constructor now takes aGuide
object directly rather than separate parameters.
Previously, the DotJsonProcessor
class (now called LogitsProcessor
) generated sets of allowed tokens and processed logits at runtime. The Guide
interface disentangles these responsibilities, allowing for more flexibility in how sets of allowed tokens are generated.
For example, allowed tokens can be calculated in parallel while the language model is performing a forward pass. Once the forward pass is complete, the sets of allowed tokens can be processed in a separate step.
v0.0.9 also introduces a few convenience methods.
Change log
Overview of changes:
DotJsonProcessor
is nowLogitsProcessor
- A new
Guide
class is used to generate sets of allowed tokens. Guides are created withGuide(index, batch_size)
- The
LogitsProcessor
constructor now accepts the arguments:guide
: TheGuide
object used to produce the token setsmask_value
: The equivalent of-std::numeric_limits<float>::infinity()
Guide
has two methods, both returning aBatchedTokenSet
:get_start_tokensets()
returns a set of allowed tokens for the start of a sequenceget_next_tokensets(last_token_ids)
returns a set of allowed tokens for the next token, given the last token
- The new
BatchedTokenSet
class provides utility methods to inspect token sets:contains(token_id)
returns a vector of booleans indicating if the token is allowed in each batch, which can be used to stop early if the EOS token is available in the set of allowed tokensnum_allowed()
returns a vector with the number of allowed tokens in each batch
- The
Vocabulary
class now supports two convenience methods:vocabulary.max_token_id()
returns the maximum token id – the size of the logits array to be allocated/processed ismax_token_id() + 1
vocabulary.eos_token_id()
returns the end-of-sequence token id, which can be used to determine when to stop generating a sequence
Detailed migration notes
Generating sets of allowed tokens
dotjson
v0.0.5 bundled the logic of generating sets of allowed tokens with the logic of processing logits. The DotJsonProcessor
class (now called LogitsProcessor
) was used to both generate sets of allowed tokens and process logits.
In v0.0.9, the Guide
class is used to generate sets of allowed tokens. The Guide
class is created with Guide(index, batch_size)
.
Guide
objects have two methods:
get_start_tokensets()
returns a set of allowed tokens for the start of a sequenceget_next_tokensets(last_token_ids)
returns a set of allowed tokens for the next token, given the last token
Here’s how to create a Guide
object from an Index
object and a batch size:
// Create the guide
dotjson::Guide guide(index, batch_size);
To generate the initial set of allowed tokens, call get_start_tokensets()
:
// Generate the initial set of allowed tokens
dotjson::BatchedTokenSet allowed_tokens = guide.get_start_tokensets();
To generate the set of allowed tokens for every step after the first, call get_next_tokensets(last_token_ids)
where last_token_ids
is a vector of the last batch_size
tokens generated by the language model.
// Generate the set of allowed tokens for the next token
dotjson::BatchedTokenSet allowed_tokens = guide.get_next_tokensets(last_token_ids);
Using the new LogitsProcessor
Here is the deprecated constructor from v0.0.5:
dotjson::DotJsonProcessor processor(index, batch_size, mask_value);
Here is the new constructor from v0.0.9:
// Create the guide
dotjson::Guide guide(index, batch_size);
// Create the processor using the guide
dotjson::LogitsProcessor processor(guide, mask_value);
When invoking the processor in v0.0.5, the method behaved as follows:
processor(logits, context);
In v0.0.9, the method is now to use a BatchedTokenSet
object in the second argument, rather than the history of generated tokens:
processor(logits, token_set);
where token_set
is a BatchedTokenSet
object generated by the Guide
object using the Guide::get_next_tokensets()
or Guide::get_start_tokensets()
methods.
Complete Example
// Create vocabulary and index
std::string model = "gpt2";
std::string schema = "{\"type\":\"object\",\"properties\":{\"x\":{\"type\":\"integer\"}}}";
dotjson::Vocabulary vocabulary(model);
dotjson::Index index(schema, vocabulary);
// Create guide and processor
std::size_t batch_size = 1;
u_int16_t mask_value = 0; // Appropriate mask value
// Create the guide and processor
dotjson::Guide guide(index, batch_size);
dotjson::LogitsProcessor processor(guide, mask_value);
// Get initial token set
dotjson::BatchedTokenSet token_set = guide.get_start_tokensets();
// Process logits and generate tokens
std::vector<std::span<uint16_t>> logits;
// Populate logits...
processor(logits, token_set);
std::vector<u_int32_t> sampled_tokens = sample_tokens(logits);
// Get the next token set and continue
token_set = guide.get_next_tokensets(sampled_tokens);
Allocating logits using vocabulary.max_token_id()
In the event that you are not tracking the vocabulary size, you can use vocabulary.max_token_id()
to allocate a logits array. Token IDs are 0-indexed, so the size of the logits array required is vocabulary.max_token_id() + 1
.
Allocate logits using vocabulary.max_token_id() + 1
:
// Create the vocabulary and index
dotjson::Vocabulary vocabulary(model);
dotjson::Index index(schema, vocabulary);
// The mask value is 0, batch size is 1
u_int16_t mask_value = 0;
u_int32_t batch_size = 1;
u_int32_t vocab_size = vocabulary.max_token_id() + 1;
// Create the logits vector
std::vector<u_int16_t> logits(vocab_size);