Module tokenizers::tokenizer::pre_tokenizer
source · Structs§
- The
PreTokenizedStringis in charge of splitting an underlying string, making sure everything is fine while doing so, and providing ways to normalize and tokenize these splits. Once everything has been normalized and tokenized, thePreTokenizedStringis able to build anEncodingwith all the relevant offsets and word ids, relative to the original string. - Wrapper for a subpart of a
NormalizedString.
Enums§
- Various possible types of offsets