Improved Factorized Neural Transducer Model For text-only Domain Adaptation
- URL: http://arxiv.org/abs/2309.09524v2
- Date: Thu, 6 Jun 2024 09:28:49 GMT
- Title: Improved Factorized Neural Transducer Model For text-only Domain Adaptation
- Authors: Junzhe Liu, Jianwei Yu, Xie Chen,
- Abstract summary: Adapting End-to-End ASR models to out-of-domain datasets with text data is challenging.
Factorized neural Transducer (FNT) aims to address this issue by introducing a separate vocabulary decoder to predict the vocabulary.
We present the improved factorized neural Transducer (IFNT) model structure designed to comprehensively integrate acoustic and language information.
- Score: 14.65352101664147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapting End-to-End ASR models to out-of-domain datasets with text data is challenging. Factorized neural Transducer (FNT) aims to address this issue by introducing a separate vocabulary decoder to predict the vocabulary. Nonetheless, this approach has limitations in fusing acoustic and language information seamlessly. Moreover, a degradation in word error rate (WER) on the general test sets was also observed, leading to doubts about its overall performance. In response to this challenge, we present the improved factorized neural Transducer (IFNT) model structure designed to comprehensively integrate acoustic and language information while enabling effective text adaptation. We assess the performance of our proposed method on English and Mandarin datasets. The results indicate that IFNT not only surpasses the neural Transducer and FNT in baseline performance in both scenarios but also exhibits superior adaptation ability compared to FNT. On source domains, IFNT demonstrated statistically significant accuracy improvements, achieving a relative enhancement of 1.2% to 2.8% in baseline accuracy compared to the neural Transducer. On out-of-domain datasets, IFNT shows relative WER(CER) improvements of up to 30.2% over the standard neural Transducer with shallow fusion, and relative WER(CER) reductions ranging from 1.1% to 2.8% on test sets compared to the FNT model.
Related papers
- Advancing Text-to-GLOSS Neural Translation Using a Novel Hyper-parameter
Optimization Technique [0.0]
The study aims to improve the accuracy and fluency of Neural Machine Translation generated GLOSS.
The experiments conducted on the PHOENIX14T dataset reveal that the optimal transformer architecture outperforms previous work on the same dataset.
arXiv Detail & Related papers (2023-09-05T11:59:31Z) - External Language Model Integration for Factorized Neural Transducers [7.5969913968845155]
We propose an adaptation method for factorized neural transducers (FNT) with external language models.
We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario.
arXiv Detail & Related papers (2023-05-26T23:30:21Z) - Fast and accurate factorized neural transducer for text adaption of
end-to-end speech recognition models [23.21666928497697]
The improved adaptation ability of Factorized neural transducer (FNT) on text-only adaptation data came at the cost of lowered accuracy compared to the standard neural transducer model.
A combination of these approaches results in a relative word-error-rate reduction of 9.48% from the standard FNT model.
arXiv Detail & Related papers (2022-12-05T02:52:21Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored
Search [11.203006652211075]
We propose a TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors.
In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads.
In online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.
arXiv Detail & Related papers (2021-01-15T23:12:47Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z) - On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration.
We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.