Consistent Training and Decoding For End-to-end Speech Recognition Using
Lattice-free MMI
- URL: http://arxiv.org/abs/2112.02498v1
- Date: Sun, 5 Dec 2021 07:30:17 GMT
- Title: Consistent Training and Decoding For End-to-end Speech Recognition Using
Lattice-free MMI
- Authors: Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong
Yu, Yuexian Zou
- Abstract summary: We propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.
Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements.
- Score: 67.13999010060057
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, End-to-End (E2E) frameworks have achieved remarkable results on
various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum
Mutual Information (LF-MMI), as one of the discriminative training criteria
that show superior performance in hybrid ASR systems, is rarely adopted in E2E
ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI
criterion into E2E ASR frameworks in both training and decoding stages. The
proposed approach shows its effectiveness on two of the most widely used E2E
frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural
Transducers (NTs). Experiments suggest that the introduction of the LF-MMI
criterion consistently leads to significant performance improvements on various
datasets and different E2E ASR frameworks. The best of our models achieves
competitive CER of 4.1\% / 4.4\% on Aishell-1 dev/test set; we also achieve
significant error reduction on Aishell-2 and Librispeech datasets over strong
baselines.
Related papers
- Acoustic Model Fusion for End-to-end Speech Recognition [7.431401982826315]
Speech recognition systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM)
We propose the integration of an external AM into the E2E system to better address the domain mismatch.
We have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets.
arXiv Detail & Related papers (2023-10-10T23:00:17Z) - End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements.
All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z) - Contextual Density Ratio for Language Model Biasing of Sequence to
Sequence ASR Systems [2.4909170697740963]
We propose a contextual density ratio approach for both training a contextual aware E2E model and adapting the language model to named entities.
Our proposed technique achieves a relative improvement of up to 46.5% on the names over an E2E baseline without degrading the overall recognition accuracy of the whole test set.
arXiv Detail & Related papers (2022-06-29T13:12:46Z) - Effect and Analysis of Large-scale Language Model Rescoring on
Competitive ASR Systems [30.873546090458678]
Large-scale language models (LLMs) have been successfully applied to ASR N-best rescoring.
In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model.
arXiv Detail & Related papers (2022-04-01T05:20:55Z) - Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems.
With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems.
Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems.
In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z) - Internal Language Model Adaptation with Text-Only Data for End-to-End
Speech Recognition [80.32546870220979]
We propose an internal LM adaptation (ILMA) of the E2E model using text-only data.
ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.
Experimented with 30K-hour trained transformer transducer models, ILMA achieves up to 34.9% relative word error rate reduction.
arXiv Detail & Related papers (2021-10-06T23:03:29Z) - Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
Recognition [56.27081731553829]
Internal language models (LM) integration is a challenging task for end-to-end (E2E) automatic speech recognition.
We propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models.
ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR.
arXiv Detail & Related papers (2020-11-03T20:11:04Z) - An Effective End-to-End Modeling Approach for Mispronunciation Detection [12.113290059233977]
We present a novel use of CTCAttention approach to the Mispronunciation detection task.
We also perform input augmentation with text prompt information to make the resulting E2E model more tailored for the MD task.
A series of Mandarin MD experiments demonstrate that our approach brings about systematic and substantial performance improvements.
arXiv Detail & Related papers (2020-05-18T03:37:21Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.