Trait tokenizers::tokenizer::PostProcessor
source · pub trait PostProcessor {
// Required methods
fn added_tokens(&self, is_pair: bool) -> usize;
fn process_encodings(
&self,
encodings: Vec<Encoding>,
add_special_tokens: bool
) -> Result<Vec<Encoding>>;
// Provided method
fn process(
&self,
encoding: Encoding,
pair_encoding: Option<Encoding>,
add_special_tokens: bool
) -> Result<Encoding> { ... }
}
Expand description
A PostProcessor
has the responsibility to post process an encoded output of the Tokenizer
.
It adds any special tokens that a language model would require.
Required Methods§
sourcefn added_tokens(&self, is_pair: bool) -> usize
fn added_tokens(&self, is_pair: bool) -> usize
Returns the number of tokens that will be added during the processing step
Provided Methods§
Implementations§
Implementors§
impl PostProcessor for PostProcessorWrapper
impl PostProcessor for ByteLevel
As a PostProcessor
, ByteLevel
is in charge of trimming the offsets if necessary.