On Language Model Integration for RNN Transducer based Speech
Recognition
- URL: http://arxiv.org/abs/2110.06841v1
- Date: Wed, 13 Oct 2021 16:30:46 GMT
- Title: On Language Model Integration for RNN Transducer based Speech
Recognition
- Authors: Wei Zhou, Zuoyun Zheng, Ralf Schl\"uter, Hermann Ney
- Abstract summary: We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
- Score: 49.84285563767935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mismatch between an external language model (LM) and the implicitly
learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance
of LM integration such as simple shallow fusion. A Bayesian interpretation
suggests to remove this sequence prior as ILM correction. In this work, we
study various ILM correction-based LM integration methods formulated in a
common RNN-T framework. We provide a decoding interpretation on two major
reasons for performance improvement with ILM correction, which is further
experimentally verified with detailed analysis. We also propose an exact-ILM
training framework by extending the proof given in the hybrid autoregressive
transducer, which enables a theoretical justification for other ILM approaches.
Systematic comparison is conducted for both in-domain and cross-domain
evaluation on the Librispeech and TED-LIUM Release 2 corpora, respectively. Our
proposed exact-ILM training can further improve the best ILM method.
Related papers
- Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT)
We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers [52.88268942796418]
Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer.
We show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view.
arXiv Detail & Related papers (2023-09-25T13:35:28Z) - Internal Language Model Estimation based Adaptive Language Model Fusion
for Domain Adaptation [12.239557608053156]
We propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA)
We demonstrate the efficacy of the proposed ILME-ADA method with both RNN-T and LAS modeling frameworks employing neural network and n-gram LMs as ELMs respectively on two domain specific (target) test sets.
arXiv Detail & Related papers (2022-11-02T09:15:20Z) - An Empirical Study of Language Model Integration for Transducer based
Speech Recognition [23.759084092602517]
Methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method.
We propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR.
arXiv Detail & Related papers (2022-03-31T03:33:50Z) - Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.