Integrate Lattice-Free MMI into End-to-End Speech Recognition
- URL: http://arxiv.org/abs/2203.15614v1
- Date: Tue, 29 Mar 2022 14:32:46 GMT
- Title: Integrate Lattice-Free MMI into End-to-End Speech Recognition
- Authors: Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou and Dong Yu
- Abstract summary: In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems.
With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems.
Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems.
In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
- Score: 87.01137882072322
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In automatic speech recognition (ASR) research, discriminative criteria have
achieved superior performance in DNN-HMM systems. Given this success, the
adoption of discriminative criteria is promising to boost the performance of
end-to-end (E2E) ASR systems. With this motivation, previous works have
introduced the minimum Bayesian risk (MBR, one of the discriminative criteria)
into E2E ASR systems. However, the effectiveness and efficiency of the
MBR-based methods are compromised: the MBR criterion is only used in system
training, which creates a mismatch between training and decoding; the
on-the-fly decoding process in MBR-based methods results in the need for
pre-trained models and slow training speeds. To this end, novel algorithms are
proposed in this work to integrate another widely used discriminative
criterion, lattice-free maximum mutual information (LF-MMI), into E2E ASR
systems not only in the training stage but also in the decoding process. The
proposed LF-MMI training and decoding methods show their effectiveness on two
widely used E2E frameworks: Attention-Based Encoder-Decoders (AEDs) and Neural
Transducers (NTs). Compared with MBR-based methods, the proposed LF-MMI method:
maintains the consistency between training and decoding; eschews the on-the-fly
decoding process; trains from randomly initialized models with superior
training efficiency. Experiments suggest that the LF-MMI method outperforms its
MBR counterparts and consistently leads to statistically significant
performance improvements on various frameworks and datasets from 30 hours to
14.3k hours. The proposed method achieves state-of-the-art (SOTA) results on
Aishell-1 (CER 4.10%) and Aishell-2 (CER 5.02%) datasets. Code is released.
Related papers
- Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - A Novel Approach for Machine Learning-based Load Balancing in High-speed
Train System using Nested Cross Validation [0.6138671548064356]
Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains.
We model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme.
arXiv Detail & Related papers (2023-10-02T09:24:10Z) - MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods [13.56549575939123]
We propose finetuning and QE finetuning to mitigate the model-perplexity-vs-quality mismatch.
We show that even with self-training, these finetuning methods significantly outperform the base model.
These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data.
arXiv Detail & Related papers (2023-09-19T23:39:07Z) - Model-based Deep Learning Receiver Design for Rate-Splitting Multiple
Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods.
The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead.
Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z) - Bit-Metric Decoding Rate in Multi-User MIMO Systems: Applications [13.848471206858617]
Part I focuses on link-adaptation (LA) and physical layer (PHY) abstraction for MU-MIMO systems with non-linear receivers.
Part II develops novel algorithms for LA, dynamic detector selection from a list of available detectors, and PHY abstraction in MU-MIMO systems with arbitrary receivers.
arXiv Detail & Related papers (2022-03-11T22:51:26Z) - Consistent Training and Decoding For End-to-end Speech Recognition Using
Lattice-free MMI [67.13999010060057]
We propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.
Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements.
arXiv Detail & Related papers (2021-12-05T07:30:17Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Machine Learning for MU-MIMO Receive Processing in OFDM Systems [14.118477167150143]
We propose an ML-enhanced MU-MIMO receiver that builds on top of a conventional linear minimum mean squared error (LMMSE) architecture.
CNNs are used to compute an approximation of the second-order statistics of the channel estimation error.
A CNN-based demapper jointly processes a large number of frequency-division multiplexing symbols and subcarriers.
arXiv Detail & Related papers (2020-12-15T09:55:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.