Struct tokenizers::normalizers::Precompiled

source ·
pub struct Precompiled { /* private fields */ }
Expand description

This struct is specifically done to be compatible with SentencePiece SentencePiece models embed their Normalizer within a precompiled_charsmap that both represents a Trie, and embedded rewrite rules. In order to be 100% compliant we need to interpret that binary format too. The format is [u32 (length of trie), trie: u32, normalized: String] The trie has u8 as entries, and u32 as values, those u32 values point to offsets withing the String that correspond to the real replace value The normalized string contains ‘\0’ that should indicate the end of an entry.

Hence, normalized could be “abc\0”, some entry in the trie could be 0 meaning the value is “abc” and another one be 1 meaning the actual entry was “bc”.

Implementations§

source§

impl Precompiled

source

pub fn from( precompiled_charsmap: &[u8] ) -> Result<Precompiled, PrecompiledError>

source

pub fn transform(&self, chunk: &str) -> Option<&str>

source

pub fn normalize_string(&self, original: &str) -> String

Trait Implementations§

source§

impl Clone for Precompiled

source§

fn clone(&self) -> Precompiled

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Precompiled

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
source§

impl Default for Precompiled

source§

fn default() -> Precompiled

Returns the “default value” for a type. Read more
source§

impl<'de> Deserialize<'de> for Precompiled

source§

fn deserialize<__D>( __deserializer: __D ) -> Result<Precompiled, <__D as Deserializer<'de>>::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl From<Precompiled> for NormalizerWrapper

source§

fn from(from: Precompiled) -> Self

Converts to this type from the input type.
source§

impl Normalizer for Precompiled

source§

fn normalize(&self, normalized: &mut NormalizedString) -> Result<()>

source§

impl PartialEq for Precompiled

source§

fn eq(&self, other: &Precompiled) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Serialize for Precompiled

source§

fn serialize<__S>( &self, __serializer: __S ) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
source§

impl StructuralPartialEq for Precompiled

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V

source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,