On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer
- URL: http://arxiv.org/abs/2010.12673v3
- Date: Fri, 26 Mar 2021 17:35:00 GMT
- Title: On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer
- Authors: Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, and Yifan Gong
- Abstract summary: We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT)
From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
- Score: 40.63693071222628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end
acoustic model that extends the standard Recurrent Neural Network Transducer
(RNN-T) for the purpose of the external language model (LM) fusion. In HAT, the
blank probability and the label probability are estimated using two separate
probability distributions, which provides a more accurate solution for internal
LM score estimation, and thus works better when combining with an external LM.
Previous work mainly focuses on HAT model training with the negative
log-likelihood loss, while in this paper, we study the minimum word error rate
(MWER) training of HAT -- a criterion that is closer to the evaluation metric
for speech recognition, and has been successfully applied to other types of
end-to-end models such as sequence-to-sequence (S2S) and RNN-T models. From
experiments with around 30,000 hours of training data, we show that MWER
training can improve the accuracy of HAT models, while at the same time,
improving the robustness of the model against the decoding hyper-parameters
such as length normalization and decoding beam during inference.
Related papers
- Embedded Nonlocal Operator Regression (ENOR): Quantifying model error in learning nonlocal operators [8.585650361148558]
We propose a new framework to learn a nonlocal homogenized surrogate model and its structural model error.
This framework provides discrepancy-adaptive uncertainty quantification for homogenized material response predictions in long-term simulations.
arXiv Detail & Related papers (2024-10-27T04:17:27Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Stabilizing Machine Learning Prediction of Dynamics: Noise and
Noise-inspired Regularization [58.720142291102135]
Recent has shown that machine learning (ML) models can be trained to accurately forecast the dynamics of chaotic dynamical systems.
In the absence of mitigating techniques, this technique can result in artificially rapid error growth, leading to inaccurate predictions and/or climate instability.
We introduce Linearized Multi-Noise Training (LMNT), a regularization technique that deterministically approximates the effect of many small, independent noise realizations added to the model input during training.
arXiv Detail & Related papers (2022-11-09T23:40:52Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework.
For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement.
For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - ES-dRNN: A Hybrid Exponential Smoothing and Dilated Recurrent Neural
Network Model for Short-Term Load Forecasting [1.4502611532302039]
Short-term load forecasting (STLF) is challenging due to complex time series (TS)
This paper proposes a novel hybrid hierarchical deep learning model that deals with multiple seasonality.
It combines exponential smoothing (ES) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-05T19:38:42Z) - Information Theoretic Structured Generative Modeling [13.117829542251188]
A novel generative model framework called the structured generative model (SGM) is proposed that makes straightforward optimization possible.
The implementation employs a single neural network driven by an orthonormal input to a single white noise source adapted to learn an infinite Gaussian mixture model.
Preliminary results show that SGM significantly improves MINE estimation in terms of data efficiency and variance, conventional and variational Gaussian mixture models, as well as for training adversarial networks.
arXiv Detail & Related papers (2021-10-12T07:44:18Z) - Joint Energy-based Model Training for Better Calibrated Natural Language
Understanding Models [61.768082640087]
We explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders for natural language understanding tasks.
Experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines.
arXiv Detail & Related papers (2021-01-18T01:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.