Exploring Gaussian mixture model framework for speaker adaptation of
deep neural network acoustic models
- URL: http://arxiv.org/abs/2003.06894v1
- Date: Sun, 15 Mar 2020 18:56:19 GMT
- Title: Exploring Gaussian mixture model framework for speaker adaptation of
deep neural network acoustic models
- Authors: Natalia Tomashenko, Yuri Khokhlov, Yannick Esteve
- Abstract summary: We investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models.
We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures.
- Score: 3.867363075280544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we investigate the GMM-derived (GMMD) features for adaptation
of deep neural network (DNN) acoustic models. The adaptation of the DNN trained
on GMMD features is done through the maximum a posteriori (MAP) adaptation of
the auxiliary GMM model used for GMMD feature extraction. We explore fusion of
the adapted GMMD features with conventional features, such as bottleneck and
MFCC features, in two different neural network architectures: DNN and
time-delay neural network (TDNN). We analyze and compare different types of
adaptation techniques such as i-vectors and feature-space adaptation techniques
based on maximum likelihood linear regression (fMLLR) with the proposed
adaptation approach, and explore their complementarity using various types of
fusion such as feature level, posterior level, lattice level and others in
order to discover the best possible way of combination. Experimental results on
the TED-LIUM corpus show that the proposed adaptation technique can be
effectively integrated into DNN and TDNN setups at different levels and provide
additional gain in recognition performance: up to 6% of relative word error
rate reduction (WERR) over the strong feature-space adaptation techniques based
on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline,
and up to 18% of relative WERR in comparison with a speaker independent (SI)
DNN baseline model, trained on conventional features. For TDNN models the
proposed approach achieves up to 26% of relative WERR in comparison with a SI
baseline, and up 13% in comparison with the model adapted by using i-vectors.
The analysis of the adapted GMMD features from various points of view
demonstrates their effectiveness at different levels.
Related papers
- AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework [0.794682109939797]
Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems.
We propose AD-NEv++, a three-stage neuroevolution-based method that synergically combines subspace evolution, model evolution, and fine-tuning.
We show that AD-NEv++ can improve and outperform the state-of-the-art GNN (Graph Neural Networks) model architecture in all anomaly detection benchmarks.
arXiv Detail & Related papers (2024-03-25T08:40:58Z) - Satellite Anomaly Detection Using Variance Based Genetic Ensemble of
Neural Networks [7.848121055546167]
We use an efficient ensemble of the predictions from multiple Recurrent Neural Networks (RNNs)
For prediction, each RNN is guided by a Genetic Algorithm (GA) which constructs the optimal structure for each RNN model.
This paper uses the Monte Carlo (MC) dropout as an approximation version of BNNs.
arXiv Detail & Related papers (2023-02-10T22:09:00Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - DEMAND: Deep Matrix Approximately NonlinearDecomposition to Identify
Meta, Canonical, and Sub-Spatial Pattern of functional Magnetic Resonance
Imaging in the Human Brain [8.93274096260726]
We propose a novel deep nonlinear matrix factorization named Deep Approximately Decomposition (DEMAND) in this work to take advantage of the shallow linear model, e.g., Sparse Dictionary Learning (SDL) and Deep Neural Networks (DNNs)
DEMAND can reveal the reproducible meta, canonical, and sub-spatial features of the human brain more efficiently than other peer methodologies.
arXiv Detail & Related papers (2022-05-20T15:55:01Z) - Parameter estimation for WMTI-Watson model of white matter using
encoder-decoder recurrent neural network [0.0]
In this study, we evaluate the performance of NLLS, the RNN-based method and a multilayer perceptron (MLP) on datasets rat and human brain.
We showed that the proposed RNN-based fitting approach had the advantage of highly reduced computation time over NLLS.
arXiv Detail & Related papers (2022-03-01T16:33:15Z) - DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for
Uncertainty Inference [52.899219617256655]
We propose a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition.
In the DS-UI, we combine the last fully-connected (FC) layer with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer.
Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection.
arXiv Detail & Related papers (2020-11-17T12:35:02Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Bayesian Graph Neural Networks with Adaptive Connection Sampling [62.51689735630133]
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs)
The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs.
arXiv Detail & Related papers (2020-06-07T07:06:35Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.