Struct tokenizers::decoders::ctc::CTC
source · #[non_exhaustive]pub struct CTC {
pub pad_token: String,
pub word_delimiter_token: String,
pub cleanup: bool,
}
Expand description
The CTC (Connectionist Temporal Classification) decoder takes care of sanitizing a list of inputs token. Due to some alignement problem the output of some models can come with duplicated token.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional
Struct { .. }
syntax; cannot be matched against without a wildcard ..
; and struct update syntax will not work.pad_token: String
The pad token used by CTC to delimit a new token.
word_delimiter_token: String
The word delimiter token. It will be replaced by a <space>
.
cleanup: bool
Whether to cleanup some tokenization artifacts. Mainly spaces before punctuation, and some abbreviated english forms.
Implementations§
Trait Implementations§
source§impl<'de> Deserialize<'de> for CTC
impl<'de> Deserialize<'de> for CTC
source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
source§impl From<CTC> for DecoderWrapper
impl From<CTC> for DecoderWrapper
Auto Trait Implementations§
impl Freeze for CTC
impl RefUnwindSafe for CTC
impl Send for CTC
impl Sync for CTC
impl Unpin for CTC
impl UnwindSafe for CTC
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more