Effective internal language model training and fusion for factorized transducer model
- URL: http://arxiv.org/abs/2404.01716v1
- Date: Tue, 2 Apr 2024 08:01:05 GMT
- Title: Effective internal language model training and fusion for factorized transducer model
- Authors: Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer,
- Abstract summary: Internal language model (ILM) of the neural transducer has been widely studied.
We propose a novel ILM training and decoding strategy for factorized transducer models.
- Score: 26.371223360905557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for non-blank token prediction. However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion. In this paper, we propose a novel ILM training and decoding strategy for factorized transducer models, which effectively combines the blank, acoustic and ILM scores. Our experiments show a 17% relative improvement over the standard decoding method when utilizing a well-trained ILM and the proposed decoding strategy on LibriSpeech datasets. Furthermore, when compared to a strong RNN-T baseline enhanced with external LM fusion, the proposed model yields a 5.5% relative improvement on general-sets and an 8.9% WER reduction for rare words. The proposed model can achieve superior performance without relying on external language models, rendering it highly efficient for production use-cases. To further improve the performance, we propose a novel and memory-efficient ILM-fusion-aware minimum word error rate (MWER) training method which improves ILM integration significantly.
Related papers
- Scalable Language Models with Posterior Inference of Latent Thought Vectors [52.63299874322121]
Latent-Thought Language Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.
LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space.
LTMs significantly outperform conventional autoregressive models and discrete diffusion models in validation perplexity and zero-shot language modeling.
arXiv Detail & Related papers (2025-02-03T17:50:34Z) - Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)
RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.
RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z) - Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition [26.79555533538622]
This paper proposes a novel model architecture, Transducer-Llama, that integrates large language models (LLMs) into a Factorized Transducer (FT) model.
The proposed streaming Transducer-Llama approach gave a 17% relative WER reduction (WERR) over a strong FT baseline and a 32% WERR over an RNN-T baseline.
arXiv Detail & Related papers (2024-12-21T03:35:49Z) - Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach [31.654345704242512]
This paper introduces a novel, model-level judge-free self-improvement framework.
Our approach employs a controlled feedback mechanism while eliminating the need for MLLMs in the verification loop.
We achieve superior precision and recall with significantly lower computational demands.
arXiv Detail & Related papers (2024-11-26T00:44:37Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation [2.296475290901356]
We focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation.
Our results demonstrate that our models achieve up to a 32% improvement compared to counterpart models.
arXiv Detail & Related papers (2024-05-14T13:59:24Z) - EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning [38.928786416891924]
We introduce efficient and effective massively multilingual sentence embedding (EMS) using cross-lingual token-level reconstruction (XTR) and sentence-level contrastive learning as training objectives.
Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources.
We release the codes for model training and the EMS pre-trained sentence embedding model, which supports 62 languages.
arXiv Detail & Related papers (2022-05-31T12:29:25Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.