LT-LM: a novel non-autoregressive language model for single-shot lattice
rescoring
- URL: http://arxiv.org/abs/2104.02526v1
- Date: Tue, 6 Apr 2021 14:06:07 GMT
- Title: LT-LM: a novel non-autoregressive language model for single-shot lattice
rescoring
- Authors: Anton Mitrofanov, Mariya Korenevskaya, Ivan Podluzhny, Yuri Khokhlov,
Aleksandr Laptev, Andrei Andrusenko, Aleksei Ilin, Maxim Korenevsky, Ivan
Medennikov, Aleksei Romanenko
- Abstract summary: We propose a novel rescoring approach, which processes the entire lattice in a single call to the model.
The key feature of our rescoring policy is a novel non-autoregressive Lattice Transformer Language Model (LT-LM)
- Score: 55.16665077221941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network-based language models are commonly used in rescoring
approaches to improve the quality of modern automatic speech recognition (ASR)
systems. Most of the existing methods are computationally expensive since they
use autoregressive language models. We propose a novel rescoring approach,
which processes the entire lattice in a single call to the model. The key
feature of our rescoring policy is a novel non-autoregressive Lattice
Transformer Language Model (LT-LM). This model takes the whole lattice as an
input and predicts a new language score for each arc. Additionally, we propose
the artificial lattices generation approach to incorporate a large amount of
text data in the LT-LM training process. Our single-shot rescoring performs
orders of magnitude faster than other rescoring methods in our experiments. It
is more than 300 times faster than pruned RNNLM lattice rescoring and N-best
rescoring while slightly inferior in terms of WER.
Related papers
- ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting
of RNN-like Language Models [0.0]
We propose an architecture to teach the model memorizing prompt during generation by synthetic gradient.
We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method.
arXiv Detail & Related papers (2023-11-03T15:34:02Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z) - Replacing Language Model for Style Transfer [6.364517234783756]
We introduce replacing language model (RLM), a sequence-to-sequence language modeling framework for text style transfer (TST)
Our method autoregressively replaces each token of the source sentence with a text span that has a similar meaning but in the target style.
The new span is generated via a non-autoregressive masked language model, which can better preserve the local-contextual meaning of the replaced token.
arXiv Detail & Related papers (2022-11-14T13:35:55Z) - ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and
Effective Text Generation [97.64625999380425]
We study the text generation task under the approach of pre-trained language models (PLMs)
By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence.
Experiments on three text generation tasks show that ELMER significantly outperforms NAR models.
arXiv Detail & Related papers (2022-10-24T14:46:47Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Straight to the Gradient: Learning to Use Novel Tokens for Neural Text
Generation [4.866431869728018]
We introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective.
Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks.
arXiv Detail & Related papers (2021-06-14T07:46:30Z) - Revisiting Simple Neural Probabilistic Language Models [27.957834093475686]
This paper revisits the neural probabilistic language model (NPLM) ofcitetBengio2003ANP.
When scaled up to modern hardware, this model performs much better than expected on word-level language model benchmarks.
Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer.
arXiv Detail & Related papers (2021-04-08T02:18:47Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.