Struct tokenizers::normalizers::Precompiled
source · pub struct Precompiled { /* private fields */ }
Expand description
This struct is specifically done to be compatible with SentencePiece
SentencePiece models embed their Normalizer within a precompiled_charsmap
that both represents a Trie, and embedded rewrite rules.
In order to be 100% compliant we need to interpret that binary format too.
The format is [u32 (length of trie), trie: u32, normalized: String]
The trie has u8 as entries, and u32 as values, those u32 values
point to offsets withing the String that correspond to the real replace value
The normalized string contains ‘\0’ that should indicate the end of an entry.
Hence, normalized could be “abc\0”, some entry in the trie could be 0 meaning the value is “abc” and another one be 1 meaning the actual entry was “bc”.
Implementations§
source§impl Precompiled
impl Precompiled
pub fn from( precompiled_charsmap: &[u8] ) -> Result<Precompiled, PrecompiledError>
pub fn transform(&self, chunk: &str) -> Option<&str>
pub fn normalize_string(&self, original: &str) -> String
Trait Implementations§
source§impl Clone for Precompiled
impl Clone for Precompiled
source§fn clone(&self) -> Precompiled
fn clone(&self) -> Precompiled
Returns a copy of the value. Read more
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moresource§impl Debug for Precompiled
impl Debug for Precompiled
source§impl Default for Precompiled
impl Default for Precompiled
source§fn default() -> Precompiled
fn default() -> Precompiled
Returns the “default value” for a type. Read more
source§impl<'de> Deserialize<'de> for Precompiled
impl<'de> Deserialize<'de> for Precompiled
source§fn deserialize<__D>(
__deserializer: __D
) -> Result<Precompiled, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(
__deserializer: __D
) -> Result<Precompiled, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
source§impl From<Precompiled> for NormalizerWrapper
impl From<Precompiled> for NormalizerWrapper
source§fn from(from: Precompiled) -> Self
fn from(from: Precompiled) -> Self
Converts to this type from the input type.
source§impl Normalizer for Precompiled
impl Normalizer for Precompiled
source§impl PartialEq for Precompiled
impl PartialEq for Precompiled
source§fn eq(&self, other: &Precompiled) -> bool
fn eq(&self, other: &Precompiled) -> bool
This method tests for
self
and other
values to be equal, and is used
by ==
.source§impl Serialize for Precompiled
impl Serialize for Precompiled
source§fn serialize<__S>(
&self,
__serializer: __S
) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>where
__S: Serializer,
fn serialize<__S>(
&self,
__serializer: __S
) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>where
__S: Serializer,
Serialize this value into the given Serde serializer. Read more
impl StructuralPartialEq for Precompiled
Auto Trait Implementations§
impl Freeze for Precompiled
impl RefUnwindSafe for Precompiled
impl Send for Precompiled
impl Sync for Precompiled
impl Unpin for Precompiled
impl UnwindSafe for Precompiled
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more