Related papers: Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

URL: http://arxiv.org/abs/2208.00064v1
Date: Fri, 29 Jul 2022 20:39:02 GMT
Title: Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Authors: Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg
Abstract summary: Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition. We present a dataset preparation method based on the granular alignment of ITN examples. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
Score: 76.87664008338317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms. One can consider ITN as a Machine Translation task and use neural sequence-to-sequence models to solve it. Unfortunately, such neural models are prone to hallucinations that could lead to unacceptable errors. To mitigate this issue, we propose a single-pass token classifier model that regards ITN as a tagging task. The model assigns a replacement fragment to every input token or marks it for deletion or copying without changes. We present a dataset preparation method based on the granular alignment of ITN examples. The proposed model is less prone to hallucination errors. The model is trained on the Google Text Normalization dataset and achieves state-of-the-art sentence accuracy on both English and Russian test sets. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions, simplifies debugging, and allows for post-processing corrections. The model is simpler than sequence-to-sequence models and easier to optimize in production settings. The model and the code to prepare the dataset is published as part of NeMo project.

Related papers

Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens. We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data. We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z)
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
Zero-Shot Text Classification via Self-Supervised Tuning [46.9902502503747]
We propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks. tuning the language models with unlabeled data, called self-supervised tuning. Our model outperforms the state-of-the-art baselines on 7 out of 10 tasks.
arXiv Detail & Related papers (2023-05-19T05:47:33Z)
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages [15.32264927462068]
We propose an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. The main idea is to pre-train the model to reconstruct de-warped mel-spectrograms from warped ones. We empirically demonstrate the effectiveness of our proposed method in low-resource language scenarios.
arXiv Detail & Related papers (2023-03-28T01:26:00Z)
DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models. It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer. It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z)
Step-unrolled Denoising Autoencoders for Text Generation [17.015573262373742]
We propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE) SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iterations than diffusion methods.
arXiv Detail & Related papers (2021-12-13T16:00:33Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
DiscreTalk: Text-to-Speech as a Machine Translation Problem [52.33785857500754]
This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT) The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model.
arXiv Detail & Related papers (2020-05-12T02:45:09Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.