Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
Recognition
- URL: http://arxiv.org/abs/2011.01991v1
- Date: Tue, 3 Nov 2020 20:11:04 GMT
- Title: Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
Recognition
- Authors: Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur,
Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
- Abstract summary: Internal language models (LM) integration is a challenging task for end-to-end (E2E) automatic speech recognition.
We propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models.
ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR.
- Score: 56.27081731553829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The external language models (LM) integration remains a challenging task for
end-to-end (E2E) automatic speech recognition (ASR) which has no clear division
between acoustic and language models. In this work, we propose an internal LM
estimation (ILME) method to facilitate a more effective integration of the
external LM with all pre-existing E2E models with no additional model training,
including the most popular recurrent neural network transducer (RNN-T) and
attention-based encoder-decoder (AED) models. Trained with audio-transcript
pairs, an E2E model implicitly learns an internal LM that characterizes the
training data in the source domain. With ILME, the internal LM scores of an E2E
model are estimated and subtracted from the log-linear interpolation between
the scores of the E2E model and the external LM. The internal LM scores are
approximated as the output of an E2E model when eliminating its acoustic
components. ILME can alleviate the domain mismatch between training and
testing, or improve the multi-domain E2E ASR. Experimented with 30K-hour
trained RNN-T and AED models, ILME achieves up to 15.5% and 6.8% relative word
error rate reductions from Shallow Fusion on out-of-domain LibriSpeech and
in-domain Microsoft production test sets, respectively.
Related papers
- Acoustic Model Fusion for End-to-end Speech Recognition [7.431401982826315]
Speech recognition systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM)
We propose the integration of an external AM into the E2E system to better address the domain mismatch.
We have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets.
arXiv Detail & Related papers (2023-10-10T23:00:17Z) - Decoupled Structure for Improved Adaptability of End-to-End Models [16.195423291103975]
This paper proposes decoupled structures for attention-based encoder-decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models.
The acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component replaceable.
Experiments for E2E ASR models trained on the Libri-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions.
arXiv Detail & Related papers (2023-08-25T12:31:12Z) - Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR
with Internal Language Model Estimation [14.840612036671734]
Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models.
We propose a novel ILME technique for CTC-based ASR models.
Our method iteratively masks the audio timesteps to estimate a pseudo log-likelihood of the internal LM.
arXiv Detail & Related papers (2023-05-05T20:35:42Z) - JEIT: Joint End-to-End Model and Internal Language Model Training for
Speech Recognition [63.38229762589485]
We propose a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM.
With 100B unpaired sentences, JEIT/CJJT improves rare-word recognition accuracy by up to 16.4% over a model trained without unpaired text.
arXiv Detail & Related papers (2023-02-16T21:07:38Z) - Internal Language Model Adaptation with Text-Only Data for End-to-End
Speech Recognition [80.32546870220979]
We propose an internal LM adaptation (ILMA) of the E2E model using text-only data.
ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.
Experimented with 30K-hour trained transformer transducer models, ILMA achieves up to 34.9% relative word error rate reduction.
arXiv Detail & Related papers (2021-10-06T23:03:29Z) - Minimum Word Error Rate Training with Language Model Fusion for
End-to-End Speech Recognition [82.60133751942854]
Internal language model estimation (ILME)-based LM fusion has shown significant word error rate (WER) reduction from Shallow Fusion.
We propose a novel MWER training with ILME (MWER-ILME) where the ILME-based fusion is conducted to generate N-best hypotheses and their posteriors.
MWER-ILME achieves on average 8.8% and 5.8% relative WER reductions from MWER and MWER-SF training, respectively, on 6 different test sets.
arXiv Detail & Related papers (2021-06-04T07:24:49Z) - Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z) - Internal Language Model Training for Domain-Adaptive End-to-End Speech
Recognition [83.739317674302]
Internal language model estimation (ILME) method can be used to improve integration between external language models and automatic speech recognition systems.
We propose an internal LM training (ILMT) method to minimize an additional internal LM loss.
ILMT encourages the E2E model to form a standalone LM inside its existing components, without sacrificing ASR accuracy.
arXiv Detail & Related papers (2021-02-02T08:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.