Struct tokenizers::tokenizer::normalizer::NormalizedString   
source · pub struct NormalizedString { /* private fields */ }Expand description
A NormalizedString takes care of processing an “original” string to modify
it and obtain a “normalized” string. It keeps both version of the string,
alignments information between both and provides an interface to retrieve
ranges of each string, using offsets from any of them.
It is possible to retrieve a part of the original string, by indexing it with offsets from the normalized one, and the other way around too. It is also possible to convert offsets from one referential to the other one easily.
Implementations§
source§impl NormalizedString
 
impl NormalizedString
sourcepub fn get_original(&self) -> &str
 
pub fn get_original(&self) -> &str
Return the original string
sourcepub fn offsets_original(&self) -> Offsets
 
pub fn offsets_original(&self) -> Offsets
Return the original offsets
sourcepub fn convert_offsets<T>(&self, range: Range<T>) -> Option<Range<usize>>
 
pub fn convert_offsets<T>(&self, range: Range<T>) -> Option<Range<usize>>
Convert the given offsets range from one referential to the other one:
Original => Normalized or Normalized => Original
Returns None when targeting something that is outside range
sourcepub fn get_range<T>(&self, range: Range<T>) -> Option<&str>
 
pub fn get_range<T>(&self, range: Range<T>) -> Option<&str>
Return a range of the normalized string
sourcepub fn get_range_original<T>(&self, range: Range<T>) -> Option<&str>
 
pub fn get_range_original<T>(&self, range: Range<T>) -> Option<&str>
Return a range of the original string
sourcepub fn slice<T>(&self, range: Range<T>) -> Option<NormalizedString>
 
pub fn slice<T>(&self, range: Range<T>) -> Option<NormalizedString>
Return a slice of the current NormalizedString If the range is not on char boundaries, return None
sourcepub fn transform_range<T, I>(
    &mut self,
    range: Range<T>,
    dest: I,
    initial_offset: usize
)
 
pub fn transform_range<T, I>( &mut self, range: Range<T>, dest: I, initial_offset: usize )
Applies transformations to the current normalized version of the string,
while updating the alignments.
This method expect an Iterator yielding each char of the new normalized string
with a change isize equals to:
1if this is a new char-Nif the char is right before N removed chars0if the char is replacing the existing one Since it is possible that the normalized string doesn’t include some of the characters at the beginning of the original one, we need aninitial_offsetwhich represents the number of removed chars at the very beginning.
sourcepub fn transform<I>(&mut self, dest: I, initial_offset: usize)
 
pub fn transform<I>(&mut self, dest: I, initial_offset: usize)
Applies transformations to the current normalized version of the string,
while updating the alignments.
This method expect an Iterator yielding each char of the new normalized string
with a change isize equals to:
1if this is a new char-Nif the char is right before N removed chars0if the char is replacing the existing one Since it is possible that the normalized string doesn’t include some of the characters at the beginning of the original one, we need aninitial_offsetwhich represents the number of removed chars at the very beginning.
sourcepub fn filter<F: Fn(char) -> bool>(&mut self, keep: F) -> &mut Self
 
pub fn filter<F: Fn(char) -> bool>(&mut self, keep: F) -> &mut Self
Applies filtering over our characters
sourcepub fn for_each<F: FnMut(char)>(&self, foreach: F) -> &Self
 
pub fn for_each<F: FnMut(char)>(&self, foreach: F) -> &Self
Calls the given function for each characters
sourcepub fn replace<P: Pattern>(&mut self, pattern: P, content: &str) -> Result<()>
 
pub fn replace<P: Pattern>(&mut self, pattern: P, content: &str) -> Result<()>
Replace anything that matches the pattern with the given content.
sourcepub fn split<P: Pattern>(
    &self,
    pattern: P,
    behavior: SplitDelimiterBehavior
) -> Result<Vec<NormalizedString>>
 
pub fn split<P: Pattern>( &self, pattern: P, behavior: SplitDelimiterBehavior ) -> Result<Vec<NormalizedString>>
Split the current string in many subparts. Specify what to do with the delimiter.
§Splitting Behavior for the delimiter
The behavior can be one of the followings:
When splitting on '-' for example, with input the-final--countdown:
- Removed => 
[ "the", "", "final", "", "", "countdown" ] - Isolated => 
[ "the", "-", "final", "-", "-", "countdown" ] - MergedWithPrevious => 
[ "the-", "final-", "-", "countdown" ] - MergedWithNext => 
[ "the", "-final", "-", "-countdown" ] 
sourcepub fn strip(&mut self) -> &mut Self
 
pub fn strip(&mut self) -> &mut Self
Remove any leading and trailing space(s) of the normalized string
sourcepub fn len(&self) -> usize
 
pub fn len(&self) -> usize
Returns the length of the normalized string (counting chars not bytes)
sourcepub fn len_original(&self) -> usize
 
pub fn len_original(&self) -> usize
Returns the length of the original string (counting chars not bytes)
Trait Implementations§
source§impl Clone for NormalizedString
 
impl Clone for NormalizedString
source§fn clone(&self) -> NormalizedString
 
fn clone(&self) -> NormalizedString
1.0.0 · source§fn clone_from(&mut self, source: &Self)
 
fn clone_from(&mut self, source: &Self)
source. Read moresource§impl Debug for NormalizedString
 
impl Debug for NormalizedString
source§impl Default for NormalizedString
 
impl Default for NormalizedString
source§fn default() -> NormalizedString
 
fn default() -> NormalizedString
source§impl From<&str> for NormalizedString
 
impl From<&str> for NormalizedString
source§impl From<NormalizedString> for PreTokenizedString
 
impl From<NormalizedString> for PreTokenizedString
source§fn from(s: NormalizedString) -> Self
 
fn from(s: NormalizedString) -> Self
source§impl From<NormalizedString> for Split
 
impl From<NormalizedString> for Split
source§fn from(n: NormalizedString) -> Self
 
fn from(n: NormalizedString) -> Self
source§impl From<String> for NormalizedString
 
impl From<String> for NormalizedString
source§impl PartialEq for NormalizedString
 
impl PartialEq for NormalizedString
source§fn eq(&self, other: &NormalizedString) -> bool
 
fn eq(&self, other: &NormalizedString) -> bool
self and other values to be equal, and is used
by ==.