Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models
- URL: http://arxiv.org/abs/2104.05544v1
- Date: Mon, 12 Apr 2021 15:16:03 GMT
- Title: Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models
- Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer,
Ralf Schl\"uter, Hermann Ney
- Abstract summary: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
- Score: 107.86965028729517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal
language model (ILM) from the training transcriptions. The integration with an
external LM trained on much more unpaired text usually leads to better
performance. A Bayesian interpretation as in the hybrid autoregressive
transducer (HAT) suggests dividing by the prior of the discriminative acoustic
model, which corresponds to this implicit LM, similarly as in the hybrid hidden
Markov model approach. The implicit LM cannot be calculated efficiently in
general and it is yet unclear what are the best methods to estimate it. In this
work, we compare different approaches from the literature and propose several
novel methods to estimate the ILM directly from the AED model. Our proposed
methods outperform all previous approaches. We also investigate other methods
to suppress the ILM mainly by decreasing the capacity of the AED model,
limiting the label context, and also by training the AED model together with a
pre-existing LM.
Related papers
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - Effective internal language model training and fusion for factorized transducer model [26.371223360905557]
Internal language model (ILM) of the neural transducer has been widely studied.
We propose a novel ILM training and decoding strategy for factorized transducer models.
arXiv Detail & Related papers (2024-04-02T08:01:05Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - Internal language model estimation through explicit context vector
learning for attention-based encoder-decoder ASR [19.233720469733797]
We propose two novel approaches to estimate the biased ILM based on Listen-Attend-Spell (LAS) models.
Experiments show that the ILMs estimated by the proposed methods achieve the lowest perplexity.
arXiv Detail & Related papers (2022-01-26T07:47:27Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed.
We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model.
The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z) - Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM)
One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora.
We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.