Module tokenizers::tokenizer::normalizer

source ·

Structs§

  • A NormalizedString takes care of processing an “original” string to modify it and obtain a “normalized” string. It keeps both version of the string, alignments information between both and provides an interface to retrieve ranges of each string, using offsets from any of them.

Enums§

  • The possible offsets referential
  • Represents a Range usable by the NormalizedString to index its content. A Range can use indices relative to either the Original or the Normalized string
  • Defines the expected behavior for the delimiter of a Split Pattern When splitting on '-' for example, with input the-final--countdown:

Functions§

  • Convert the given range from bytes to char
  • Convert the given range from char to bytes
  • Returns a range of the given string slice, by indexing chars instead of bytes