Iterative Shallow Fusion of Backward Language Model for End-to-End
Speech Recognition
- URL: http://arxiv.org/abs/2310.11010v1
- Date: Tue, 17 Oct 2023 05:44:10 GMT
- Title: Iterative Shallow Fusion of Backward Language Model for End-to-End
Speech Recognition
- Authors: Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc
Delcroix
- Abstract summary: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR)
We iteratively apply the BLM to partial ASR hypotheses in the backward direction (i.e., from the possible next token to the start symbol) during decoding, substituting the newly calculated BLM scores for the scores calculated at the last iteration.
In experiments using an attention-based encoder-decoder ASR system, we confirmed that ISF shows comparable performance with SF using the FLM.
- Score: 48.328702724611496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new shallow fusion (SF) method to exploit an external backward
language model (BLM) for end-to-end automatic speech recognition (ASR). The BLM
has complementary characteristics with a forward language model (FLM), and the
effectiveness of their combination has been confirmed by rescoring ASR
hypotheses as post-processing. In the proposed SF, we iteratively apply the BLM
to partial ASR hypotheses in the backward direction (i.e., from the possible
next token to the start symbol) during decoding, substituting the newly
calculated BLM scores for the scores calculated at the last iteration. To
enhance the effectiveness of this iterative SF (ISF), we train a partial
sentence-aware BLM (PBLM) using reversed text data including partial sentences,
considering the framework of ISF. In experiments using an attention-based
encoder-decoder ASR system, we confirmed that ISF using the PBLM shows
comparable performance with SF using the FLM. By performing ISF, early pruning
of prospective hypotheses can be prevented during decoding, and we can obtain a
performance improvement compared to applying the PBLM as post-processing.
Finally, we confirmed that, by combining SF and ISF, further performance
improvement can be obtained thanks to the complementarity of the FLM and PBLM.
Related papers
- Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition [17.376550014426623]
This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs)
We propose "delayed fusion," which applies LLM scores to ASR hypotheses with a delay during decoding.
We demonstrate that delayed fusion provides improved decoding speed and accuracy compared to shallow fusion and N-best rescoring.
arXiv Detail & Related papers (2025-01-16T03:01:50Z) - Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models [49.48313161005423]
A hybrid language model (HLM) architecture integrates a small language model (SLM) operating on a mobile device with a large language model (LLM) hosted at the base station (BS) of a wireless network.
The HLM token generation process follows the speculative inference principle: the SLM's vocabulary distribution is uploaded to the LLM, which either accepts or rejects it, with rejected tokens being resampled by the LLM.
We propose a novel HLM structure coined Uncertainty-aware opportunistic HLM (U-HLM), wherein the SLM locally measures its output uncertainty and skips both up
arXiv Detail & Related papers (2024-12-17T09:08:18Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents [25.825941077332182]
This paper is the first to investigate Large Language Models (LLMs) as in-context decision-makers under the problem of Dueling Bandits (DB)
We compare GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, Llama 3.1, and o1-Preview against nine well-established DB algorithms.
We show that our top-performing LLM, GPT-4 Turbo, has the zero-shot relative decision-making ability to achieve surprisingly low weak regret.
arXiv Detail & Related papers (2024-07-02T02:18:14Z) - Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions [28.211967723403987]
We find that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning.
Our recognition results on an average of 10 Indics show that the proposed prefix-tuning with RNNT loss results in a 12% relative improvement in WER over the baseline with a fine-tuned LLM.
arXiv Detail & Related papers (2024-06-20T19:50:49Z) - FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation.
To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed.
We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents [13.586996848831543]
We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs.
For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation.
We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
arXiv Detail & Related papers (2021-04-21T00:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.