Module tokenizers::tokenizer::pre_tokenizer
source · Structs§
- The
PreTokenizedString
is in charge of splitting an underlying string, making sure everything is fine while doing so, and providing ways to normalize and tokenize these splits. Once everything has been normalized and tokenized, thePreTokenizedString
is able to build anEncoding
with all the relevant offsets and word ids, relative to the original string. - Wrapper for a subpart of a
NormalizedString
.
Enums§
- Various possible types of offsets