Novel Hybrid DNN Approaches for Speaker Verification in Emotional and
Stressful Talking Environments
- URL: http://arxiv.org/abs/2112.13353v1
- Date: Sun, 26 Dec 2021 10:47:14 GMT
- Title: Novel Hybrid DNN Approaches for Speaker Verification in Emotional and
Stressful Talking Environments
- Authors: Ismail Shahin, Ali Bou Nassif, Nawel Nemmour, Ashraf Elnagar, Adi
Alhudhaif, Kemal Polat
- Abstract summary: This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers.
Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), and hidden Markov model-deep neural network (HMM-DNN)
Results showed that HMM-DNN outperformed all other hybrid models in terms of equal error rate (EER) and area under the curve (AUC) evaluation metrics.
- Score: 1.0998375857698495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we conducted an empirical comparative study of the performance
of text-independent speaker verification in emotional and stressful
environments. This work combined deep models with shallow architecture, which
resulted in novel hybrid classifiers. Four distinct hybrid models were
utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural
network-Gaussian mixture model (DNN-GMM), Gaussian mixture model-deep neural
network (GMM-DNN), and hidden Markov model-deep neural network (HMM-DNN). All
models were based on novel implemented architecture. The comparative study used
three distinct speech datasets: a private Arabic dataset and two public English
databases, namely, Speech Under Simulated and Actual Stress (SUSAS) and Ryerson
Audio-Visual Database of Emotional Speech and Song (RAVDESS). The test results
of the aforementioned hybrid models demonstrated that the proposed HMM-DNN
leveraged the verification performance in emotional and stressful environments.
Results also showed that HMM-DNN outperformed all other hybrid models in terms
of equal error rate (EER) and area under the curve (AUC) evaluation metrics.
The average resulting verification system based on the three datasets yielded
EERs of 7.19%, 16.85%, 11.51%, and 11.90% based on HMM-DNN, DNN-HMM, DNN-GMM,
and GMM-DNN, respectively. Furthermore, we found that the DNN-GMM model
demonstrated the least computational complexity compared to all other hybrid
models in both talking environments. Conversely, the HMM-DNN model required the
greatest amount of training time. Findings also demonstrated that EER and AUC
values depended on the database when comparing average emotional and stressful
performances.
Related papers
- How to Learn More? Exploring Kolmogorov-Arnold Networks for Hyperspectral Image Classification [26.37105279142761]
Kolmogorov-Arnold Networks (KANs) were proposed as viable alternatives for vision transformers (ViTs)
In this study, we assess the effectiveness of KANs for complex hyperspectral image (HSI) data classification.
To enhance the HSI classification accuracy obtained by the KANs, we develop and propose a Hybrid architecture utilizing 1D, 2D, and 3D KANs.
arXiv Detail & Related papers (2024-06-22T03:31:02Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - 2D Self-Organized ONN Model For Handwritten Text Recognition [4.66970207245168]
This study proposes the 2D Self-organized ONNs (Self-ONNs) in the core of a novel network model.
Deformable convolutions, which have recently been demonstrated to tackle variations in the writing styles better, are utilized in this study.
Results show that the proposed model with the operational layers of Self-ONNs significantly improves Character Error Rate (CER) and Word Error Rate (WER)
arXiv Detail & Related papers (2022-07-17T11:18:20Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Auditory Attention Decoding from EEG using Convolutional Recurrent
Neural Network [20.37214453938965]
The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario.
Recent models based on deep neural networks (DNN) have been proposed to solve this problem.
In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model.
arXiv Detail & Related papers (2021-03-03T05:09:40Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - Exploring Gaussian mixture model framework for speaker adaptation of
deep neural network acoustic models [3.867363075280544]
We investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models.
We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures.
arXiv Detail & Related papers (2020-03-15T18:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.