Transformer Language Models with LSTM-based Cross-utterance Information
Representation
- URL: http://arxiv.org/abs/2102.06474v1
- Date: Fri, 12 Feb 2021 12:12:29 GMT
- Title: Transformer Language Models with LSTM-based Cross-utterance Information
Representation
- Authors: G. Sun, C. Zhang, P. C. Woodland
- Abstract summary: This paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM.
To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks.
The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets.
- Score: 3.976291254896486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effective incorporation of cross-utterance information has the potential
to improve language models (LMs) for automatic speech recognition (ASR). To
extract more powerful and robust cross-utterance representations for the
Transformer LM (TLM), this paper proposes the R-TLM which uses hidden states in
a long short-term memory (LSTM) LM. To encode the cross-utterance information,
the R-TLM incorporates an LSTM module together with a segment-wise recurrence
in some of the Transformer blocks. In addition to the LSTM module output, a
shortcut connection using a fusion layer that bypasses the LSTM module is also
investigated. The proposed system was evaluated on the AMI meeting corpus, the
Eval2000 and the RT03 telephone conversation evaluation sets. The best R-TLM
achieved 0.9%, 0.6%, and 0.8% absolute WER reductions over the single-utterance
TLM baseline, and 0.5%, 0.3%, 0.2% absolute WER reductions over a strong
cross-utterance TLM baseline on the AMI evaluation set, Eval2000 and RT03
respectively. Improvements on Eval2000 and RT03 were further supported by
significance tests. R-TLMs were found to have better LM scores on words where
recognition errors are more likely to occur. The R-TLM WER can be further
reduced by interpolation with an LSTM-LM.
Related papers
- Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM)
We introduce exponential gating with appropriate normalization and stabilization techniques.
We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z) - Modular Hybrid Autoregressive Transducer [51.29870462504761]
Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition.
We propose a modular hybrid autoregressive transducer that has structurally separated label and blank decoders.
On Google's large-scale production data, a multi-domain MHAT adapted with 100B sentences achieves relative WER reductions of up to 12.4% without LM fusion.
arXiv Detail & Related papers (2022-10-31T03:56:37Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - LSTM-LM with Long-Term History for First-Pass Decoding in Conversational
Speech Recognition [27.639919625398]
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems.
Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework.
arXiv Detail & Related papers (2020-10-21T23:40:26Z) - Cross-Utterance Language Models with Acoustic Error Sampling [1.376408511310322]
Cross-utterance LM (CULM) is proposed to augment the input to a standard long short-term memory (LSTM) LM.
An acoustic error sampling technique is proposed to reduce the mismatch between training and test-time.
Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER.
arXiv Detail & Related papers (2020-08-19T17:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.