Related papers: Transformer Language Models with LSTM-based Cross-utterance Information Representation

Transformer Language Models with LSTM-based Cross-utterance Information Representation

URL: http://arxiv.org/abs/2102.06474v1
Date: Fri, 12 Feb 2021 12:12:29 GMT
Title: Transformer Language Models with LSTM-based Cross-utterance Information Representation
Authors: G. Sun, C. Zhang, P. C. Woodland
Abstract summary: This paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM. To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks. The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets.
Score: 3.976291254896486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The effective incorporation of cross-utterance information has the potential to improve language models (LMs) for automatic speech recognition (ASR). To extract more powerful and robust cross-utterance representations for the Transformer LM (TLM), this paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM. To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks. In addition to the LSTM module output, a shortcut connection using a fusion layer that bypasses the LSTM module is also investigated. The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets. The best R-TLM achieved 0.9%, 0.6%, and 0.8% absolute WER reductions over the single-utterance TLM baseline, and 0.5%, 0.3%, 0.2% absolute WER reductions over a strong cross-utterance TLM baseline on the AMI evaluation set, Eval2000 and RT03 respectively. Improvements on Eval2000 and RT03 were further supported by significance tests. R-TLMs were found to have better LM scores on words where recognition errors are more likely to occur. The R-TLM WER can be further reduced by interpolation with an LSTM-LM.

Related papers

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition [26.79555533538622]
This paper proposes a novel model architecture, Transducer-Llama, that integrates large language models (LLMs) into a Factorized Transducer (FT) model. The proposed streaming Transducer-Llama approach gave a 17% relative WER reduction (WERR) over a strong FT baseline and a 32% WERR over an RNN-T baseline.
arXiv Detail & Related papers (2024-12-21T03:35:49Z)
Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models [49.48313161005423]
A hybrid language model (HLM) architecture integrates a small language model (SLM) operating on a mobile device with a large language model (LLM) hosted at the base station (BS) of a wireless network. The HLM token generation process follows the speculative inference principle: the SLM's vocabulary distribution is uploaded to the LLM, which either accepts or rejects it, with rejected tokens being resampled by the LLM. We propose a novel HLM structure coined Uncertainty-aware opportunistic HLM (U-HLM), wherein the SLM locally measures its output uncertainty and skips both up
arXiv Detail & Related papers (2024-12-17T09:08:18Z)
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z)
xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM) We introduce exponential gating with appropriate normalization and stabilization techniques. We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z)
Modular Hybrid Autoregressive Transducer [51.29870462504761]
Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition. We propose a modular hybrid autoregressive transducer that has structurally separated label and blank decoders. On Google's large-scale production data, a multi-domain MHAT adapted with 100B sentences achieves relative WER reductions of up to 12.4% without LM fusion.
arXiv Detail & Related papers (2022-10-31T03:56:37Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. In practice, it is often observed that Transformer models have better representation power than LSTM. We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z)
On Language Model Integration for RNN Transducer based Speech Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition [27.639919625398]
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework.
arXiv Detail & Related papers (2020-10-21T23:40:26Z)
Cross-Utterance Language Models with Acoustic Error Sampling [1.376408511310322]
Cross-utterance LM (CULM) is proposed to augment the input to a standard long short-term memory (LSTM) LM. An acoustic error sampling technique is proposed to reduce the mismatch between training and test-time. Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER.
arXiv Detail & Related papers (2020-08-19T17:40:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.