Iterative Shallow Fusion of Backward Language Model for End-to-End
Speech Recognition
- URL: http://arxiv.org/abs/2310.11010v1
- Date: Tue, 17 Oct 2023 05:44:10 GMT
- Title: Iterative Shallow Fusion of Backward Language Model for End-to-End
Speech Recognition
- Authors: Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc
Delcroix
- Abstract summary: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR)
We iteratively apply the BLM to partial ASR hypotheses in the backward direction (i.e., from the possible next token to the start symbol) during decoding, substituting the newly calculated BLM scores for the scores calculated at the last iteration.
In experiments using an attention-based encoder-decoder ASR system, we confirmed that ISF shows comparable performance with SF using the FLM.
- Score: 48.328702724611496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new shallow fusion (SF) method to exploit an external backward
language model (BLM) for end-to-end automatic speech recognition (ASR). The BLM
has complementary characteristics with a forward language model (FLM), and the
effectiveness of their combination has been confirmed by rescoring ASR
hypotheses as post-processing. In the proposed SF, we iteratively apply the BLM
to partial ASR hypotheses in the backward direction (i.e., from the possible
next token to the start symbol) during decoding, substituting the newly
calculated BLM scores for the scores calculated at the last iteration. To
enhance the effectiveness of this iterative SF (ISF), we train a partial
sentence-aware BLM (PBLM) using reversed text data including partial sentences,
considering the framework of ISF. In experiments using an attention-based
encoder-decoder ASR system, we confirmed that ISF using the PBLM shows
comparable performance with SF using the FLM. By performing ISF, early pruning
of prospective hypotheses can be prevented during decoding, and we can obtain a
performance improvement compared to applying the PBLM as post-processing.
Finally, we confirmed that, by combining SF and ISF, further performance
improvement can be obtained thanks to the complementarity of the FLM and PBLM.
Related papers
- TRANSPOSE: Transitional Approaches for Spatially-Aware LFI Resilient FSM Encoding [2.236957801565796]
Finite state machines (FSMs) regulate sequential circuits, including access to sensitive information and privileged CPU states.
Laser-based fault injection (LFI) is becoming even more precise where an adversary can thwart chip security by altering individual flip-flop (FF) values.
arXiv Detail & Related papers (2024-11-05T04:18:47Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions [28.211967723403987]
We find that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning.
Our recognition results on an average of 10 Indics show that the proposed prefix-tuning with RNNT loss results in a 12% relative improvement in WER over the baseline with a fine-tuned LLM.
arXiv Detail & Related papers (2024-06-20T19:50:49Z) - FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation.
To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed.
We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Lightweight and Flexible Deep Equilibrium Learning for CSI Feedback in
FDD Massive MIMO [13.856867175477042]
In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) needs to be sent back to the base station (BS) by the users.
We propose a lightweight and flexible deep learning-based CSI feedback approach by capitalizing on deep equilibrium models.
arXiv Detail & Related papers (2022-11-28T05:53:09Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents [13.586996848831543]
We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs.
For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation.
We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
arXiv Detail & Related papers (2021-04-21T00:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.