Struct tokenizers::tokenizer::normalizer::NormalizedString
source · pub struct NormalizedString { /* private fields */ }
Expand description
A NormalizedString
takes care of processing an “original” string to modify
it and obtain a “normalized” string. It keeps both version of the string,
alignments information between both and provides an interface to retrieve
ranges of each string, using offsets from any of them.
It is possible to retrieve a part of the original string, by indexing it with offsets from the normalized one, and the other way around too. It is also possible to convert offsets from one referential to the other one easily.
Implementations§
source§impl NormalizedString
impl NormalizedString
sourcepub fn get_original(&self) -> &str
pub fn get_original(&self) -> &str
Return the original string
sourcepub fn offsets_original(&self) -> Offsets
pub fn offsets_original(&self) -> Offsets
Return the original offsets
sourcepub fn convert_offsets<T>(&self, range: Range<T>) -> Option<Range<usize>>
pub fn convert_offsets<T>(&self, range: Range<T>) -> Option<Range<usize>>
Convert the given offsets range from one referential to the other one:
Original => Normalized
or Normalized => Original
Returns None
when targeting something that is outside range
sourcepub fn get_range<T>(&self, range: Range<T>) -> Option<&str>
pub fn get_range<T>(&self, range: Range<T>) -> Option<&str>
Return a range of the normalized string
sourcepub fn get_range_original<T>(&self, range: Range<T>) -> Option<&str>
pub fn get_range_original<T>(&self, range: Range<T>) -> Option<&str>
Return a range of the original string
sourcepub fn slice<T>(&self, range: Range<T>) -> Option<NormalizedString>
pub fn slice<T>(&self, range: Range<T>) -> Option<NormalizedString>
Return a slice of the current NormalizedString If the range is not on char boundaries, return None
sourcepub fn transform_range<T, I>(
&mut self,
range: Range<T>,
dest: I,
initial_offset: usize
)
pub fn transform_range<T, I>( &mut self, range: Range<T>, dest: I, initial_offset: usize )
Applies transformations to the current normalized version of the string,
while updating the alignments.
This method expect an Iterator yielding each char of the new normalized string
with a change
isize equals to:
1
if this is a new char-N
if the char is right before N removed chars0
if the char is replacing the existing one Since it is possible that the normalized string doesn’t include some of the characters at the beginning of the original one, we need aninitial_offset
which represents the number of removed chars at the very beginning.
sourcepub fn transform<I>(&mut self, dest: I, initial_offset: usize)
pub fn transform<I>(&mut self, dest: I, initial_offset: usize)
Applies transformations to the current normalized version of the string,
while updating the alignments.
This method expect an Iterator yielding each char of the new normalized string
with a change
isize equals to:
1
if this is a new char-N
if the char is right before N removed chars0
if the char is replacing the existing one Since it is possible that the normalized string doesn’t include some of the characters at the beginning of the original one, we need aninitial_offset
which represents the number of removed chars at the very beginning.
sourcepub fn filter<F: Fn(char) -> bool>(&mut self, keep: F) -> &mut Self
pub fn filter<F: Fn(char) -> bool>(&mut self, keep: F) -> &mut Self
Applies filtering over our characters
sourcepub fn for_each<F: FnMut(char)>(&self, foreach: F) -> &Self
pub fn for_each<F: FnMut(char)>(&self, foreach: F) -> &Self
Calls the given function for each characters
sourcepub fn replace<P: Pattern>(&mut self, pattern: P, content: &str) -> Result<()>
pub fn replace<P: Pattern>(&mut self, pattern: P, content: &str) -> Result<()>
Replace anything that matches the pattern with the given content.
sourcepub fn split<P: Pattern>(
&self,
pattern: P,
behavior: SplitDelimiterBehavior
) -> Result<Vec<NormalizedString>>
pub fn split<P: Pattern>( &self, pattern: P, behavior: SplitDelimiterBehavior ) -> Result<Vec<NormalizedString>>
Split the current string in many subparts. Specify what to do with the delimiter.
§Splitting Behavior for the delimiter
The behavior can be one of the followings:
When splitting on '-'
for example, with input the-final--countdown
:
- Removed =>
[ "the", "", "final", "", "", "countdown" ]
- Isolated =>
[ "the", "-", "final", "-", "-", "countdown" ]
- MergedWithPrevious =>
[ "the-", "final-", "-", "countdown" ]
- MergedWithNext =>
[ "the", "-final", "-", "-countdown" ]
sourcepub fn strip(&mut self) -> &mut Self
pub fn strip(&mut self) -> &mut Self
Remove any leading and trailing space(s) of the normalized string
sourcepub fn len(&self) -> usize
pub fn len(&self) -> usize
Returns the length of the normalized string (counting chars not bytes)
sourcepub fn len_original(&self) -> usize
pub fn len_original(&self) -> usize
Returns the length of the original string (counting chars not bytes)
Trait Implementations§
source§impl Clone for NormalizedString
impl Clone for NormalizedString
source§fn clone(&self) -> NormalizedString
fn clone(&self) -> NormalizedString
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for NormalizedString
impl Debug for NormalizedString
source§impl Default for NormalizedString
impl Default for NormalizedString
source§fn default() -> NormalizedString
fn default() -> NormalizedString
source§impl From<&str> for NormalizedString
impl From<&str> for NormalizedString
source§impl From<NormalizedString> for PreTokenizedString
impl From<NormalizedString> for PreTokenizedString
source§fn from(s: NormalizedString) -> Self
fn from(s: NormalizedString) -> Self
source§impl From<NormalizedString> for Split
impl From<NormalizedString> for Split
source§fn from(n: NormalizedString) -> Self
fn from(n: NormalizedString) -> Self
source§impl From<String> for NormalizedString
impl From<String> for NormalizedString
source§impl PartialEq for NormalizedString
impl PartialEq for NormalizedString
source§fn eq(&self, other: &NormalizedString) -> bool
fn eq(&self, other: &NormalizedString) -> bool
self
and other
values to be equal, and is used
by ==
.