An Extended Variational Mode Decomposition Algorithm Developed Speech
Emotion Recognition Performance
- URL: http://arxiv.org/abs/2312.10937v1
- Date: Mon, 18 Dec 2023 05:24:03 GMT
- Title: An Extended Variational Mode Decomposition Algorithm Developed Speech
Emotion Recognition Performance
- Authors: David Hason Rudd, Huan Huo, Guandong Xu
- Abstract summary: This study proposes VGG-optiVMD, an empowered variational mode decomposition algorithm, to distinguish meaningful speech features.
Various feature vectors were employed to train the VGG16 network on different databases and assess VGG-optiVMD and reliability.
Results confirmed a synergistic relationship between the fine-tuning of the signal sample rate and decomposition parameters with classification accuracy.
- Score: 15.919990281329085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition (ER) from speech signals is a robust approach since it
cannot be imitated like facial expression or text based sentiment analysis.
Valuable information underlying the emotions are significant for human-computer
interactions enabling intelligent machines to interact with sensitivity in the
real world. Previous ER studies through speech signal processing have focused
exclusively on associations between different signal mode decomposition methods
and hidden informative features. However, improper decomposition parameter
selections lead to informative signal component losses due to mode duplicating
and mixing. In contrast, the current study proposes VGG-optiVMD, an empowered
variational mode decomposition algorithm, to distinguish meaningful speech
features and automatically select the number of decomposed modes and optimum
balancing parameter for the data fidelity constraint by assessing their effects
on the VGG16 flattening output layer. Various feature vectors were employed to
train the VGG16 network on different databases and assess VGG-optiVMD
reproducibility and reliability. One, two, and three-dimensional feature
vectors were constructed by concatenating Mel-frequency cepstral coefficients,
Chromagram, Mel spectrograms, Tonnetz diagrams, and spectral centroids. Results
confirmed a synergistic relationship between the fine-tuning of the signal
sample rate and decomposition parameters with classification accuracy,
achieving state-of-the-art 96.09% accuracy in predicting seven emotions on the
Berlin EMO-DB database.
Related papers
- Multi-Source Domain Adaptation with Transformer-based Feature Generation
for Subject-Independent EEG-based Emotion Recognition [0.5439020425819]
We propose a multi-source domain adaptation approach with a transformer-based feature generator (MSDA-TF) designed to leverage information from multiple sources.
During the adaptation process, we group the source subjects based on correlation values and aim to align the moments of the target subject with each source as well as within the sources.
MSDA-TF is validated on the SEED dataset and is shown to yield promising results.
arXiv Detail & Related papers (2024-01-04T16:38:47Z) - Specific Emitter Identification Based on Joint Variational Mode Decomposition [7.959137957880584]
Specific emitter identification (SEI) technology is significant in device administration scenarios, such as self-organized networking and spectrum management.
For nonlinear and non-stationary electromagnetic signals, SEI often employs variational modal decomposition (VMD) to decompose the signal in order to effectively characterize the distinct device fingerprint.
In this paper, we propose a joint variational modal decomposition (JVMD) algorithm, which is an improved version of VMD by simultaneously implementing modal decomposition on multi-frame signals.
arXiv Detail & Related papers (2024-01-03T02:19:32Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
and Recognition [52.11964238935099]
An audio-visual multi-channel speech separation, dereverberation and recognition approach is proposed in this paper.
Video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end.
Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset.
arXiv Detail & Related papers (2023-07-06T10:50:46Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Hybrid Data Augmentation and Deep Attention-based Dilated
Convolutional-Recurrent Neural Networks for Speech Emotion Recognition [1.1086440815804228]
We investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods.
To evaluate the effectiveness of HDA methods, a deep learning framework namely (ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural networks with an attention mechanism.
For validating our proposed methods, we use the EmoDB dataset that consists of several emotions with imbalanced samples.
arXiv Detail & Related papers (2021-09-18T23:13:44Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.