Cross-Utterance Language Models with Acoustic Error Sampling
- URL: http://arxiv.org/abs/2009.01008v1
- Date: Wed, 19 Aug 2020 17:40:11 GMT
- Title: Cross-Utterance Language Models with Acoustic Error Sampling
- Authors: G. Sun, C. Zhang and P. C. Woodland
- Abstract summary: Cross-utterance LM (CULM) is proposed to augment the input to a standard long short-term memory (LSTM) LM.
An acoustic error sampling technique is proposed to reduce the mismatch between training and test-time.
Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER.
- Score: 1.376408511310322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effective exploitation of richer contextual information in language
models (LMs) is a long-standing research problem for automatic speech
recognition (ASR). A cross-utterance LM (CULM) is proposed in this paper, which
augments the input to a standard long short-term memory (LSTM) LM with a
context vector derived from past and future utterances using an extraction
network. The extraction network uses another LSTM to encode surrounding
utterances into vectors which are integrated into a context vector using either
a projection of LSTM final hidden states, or a multi-head self-attentive layer.
In addition, an acoustic error sampling technique is proposed to reduce the
mismatch between training and test-time. This is achieved by considering
possible ASR errors into the model training procedure, and can therefore
improve the word error rate (WER). Experiments performed on both AMI and
Switchboard datasets show that CULMs outperform the LSTM LM baseline WER. In
particular, the CULM with a self-attentive layer-based extraction network and
acoustic error sampling achieves 0.6% absolute WER reduction on AMI, 0.3% WER
reduction on the Switchboard part and 0.9% WER reduction on the Callhome part
of Eval2000 test set over the respective baselines.
Related papers
- R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers [52.88268942796418]
Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer.
We show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view.
arXiv Detail & Related papers (2023-09-25T13:35:28Z) - Connecting Speech Encoder and Large Language Model for ASR [25.660343393359565]
The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR)
This paper presents a comparative study of three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, and Q-Former.
Experiments were performed on the commonly used LibriSpeech, Common Voice, and GigaSpeech datasets.
arXiv Detail & Related papers (2023-09-25T08:57:07Z) - Leveraging Cross-Utterance Context For ASR Decoding [6.033324057680156]
Cross utterance information has been shown to be beneficial during second pass re-scoring.
We investigate the incorporation of long-context transformer LMs for cross-utterance decoding of acoustic models via beam search.
arXiv Detail & Related papers (2023-06-29T12:48:25Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Transformer Language Models with LSTM-based Cross-utterance Information
Representation [3.976291254896486]
This paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM.
To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks.
The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets.
arXiv Detail & Related papers (2021-02-12T12:12:29Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.