Module tokenizers::tokenizer::pre_tokenizer

source ·

Structs§

  • The PreTokenizedString is in charge of splitting an underlying string, making sure everything is fine while doing so, and providing ways to normalize and tokenize these splits. Once everything has been normalized and tokenized, the PreTokenizedString is able to build an Encoding with all the relevant offsets and word ids, relative to the original string.
  • Wrapper for a subpart of a NormalizedString.

Enums§