Related papers: Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

URL: http://arxiv.org/abs/2104.05544v1
Date: Mon, 12 Apr 2021 15:16:03 GMT
Title: Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schl\"uter, Hermann Ney
Abstract summary: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. We propose several novel methods to estimate the ILM directly from the AED model.
Score: 107.86965028729517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

Related papers

Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders [17.66934724195822]
Large Language Models (LLMs) are traditionally viewed as black-box algorithms.<n>In this work, we apply an effective LLM decomposition method using a dictionary-learning approach with sparse autoencoders.
arXiv Detail & Related papers (2025-07-08T22:17:52Z)
Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing. We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z)
Effective internal language model training and fusion for factorized transducer model [26.371223360905557]
Internal language model (ILM) of the neural transducer has been widely studied. We propose a novel ILM training and decoding strategy for factorized transducer models.
arXiv Detail & Related papers (2024-04-02T08:01:05Z)
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z)
Extrapolating Multilingual Understanding Models as Multilingual Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR [19.233720469733797]
We propose two novel approaches to estimate the biased ILM based on Listen-Attend-Spell (LAS) models. Experiments show that the ILMs estimated by the proposed methods achieve the lowest perplexity.
arXiv Detail & Related papers (2022-01-26T07:47:27Z)
On Language Model Integration for RNN Transducer based Speech Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)
Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.