An Extended Variational Mode Decomposition Algorithm Developed Speech
Emotion Recognition Performance
- URL: http://arxiv.org/abs/2312.10937v1
- Date: Mon, 18 Dec 2023 05:24:03 GMT
- Title: An Extended Variational Mode Decomposition Algorithm Developed Speech
Emotion Recognition Performance
- Authors: David Hason Rudd, Huan Huo, Guandong Xu
- Abstract summary: This study proposes VGG-optiVMD, an empowered variational mode decomposition algorithm, to distinguish meaningful speech features.
Various feature vectors were employed to train the VGG16 network on different databases and assess VGG-optiVMD and reliability.
Results confirmed a synergistic relationship between the fine-tuning of the signal sample rate and decomposition parameters with classification accuracy.
- Score: 15.919990281329085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition (ER) from speech signals is a robust approach since it
cannot be imitated like facial expression or text based sentiment analysis.
Valuable information underlying the emotions are significant for human-computer
interactions enabling intelligent machines to interact with sensitivity in the
real world. Previous ER studies through speech signal processing have focused
exclusively on associations between different signal mode decomposition methods
and hidden informative features. However, improper decomposition parameter
selections lead to informative signal component losses due to mode duplicating
and mixing. In contrast, the current study proposes VGG-optiVMD, an empowered
variational mode decomposition algorithm, to distinguish meaningful speech
features and automatically select the number of decomposed modes and optimum
balancing parameter for the data fidelity constraint by assessing their effects
on the VGG16 flattening output layer. Various feature vectors were employed to
train the VGG16 network on different databases and assess VGG-optiVMD
reproducibility and reliability. One, two, and three-dimensional feature
vectors were constructed by concatenating Mel-frequency cepstral coefficients,
Chromagram, Mel spectrograms, Tonnetz diagrams, and spectral centroids. Results
confirmed a synergistic relationship between the fine-tuning of the signal
sample rate and decomposition parameters with classification accuracy,
achieving state-of-the-art 96.09% accuracy in predicting seven emotions on the
Berlin EMO-DB database.
Related papers
- A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals [0.0]
Brain-computer interface (BCI) technology enables direct interaction between humans and computers by analyzing brain signals.
EEG signals are often affected by a low signal-to-noise ratio, physiological artifacts, and individual variability, representing challenges in extracting distinct features.
Also, motor imagery (MI)-based EEG signals could contain features with low correlation to MI characteristics, which might cause the weights of the deep model to become biased towards those features.
arXiv Detail & Related papers (2024-10-31T00:53:29Z) - Specific Emitter Identification Based on Joint Variational Mode Decomposition [7.959137957880584]
Specific emitter identification (SEI) technology is significant in device administration scenarios, such as self-organized networking and spectrum management.
For nonlinear and non-stationary electromagnetic signals, SEI often employs variational modal decomposition (VMD) to decompose the signal in order to effectively characterize the distinct device fingerprint.
In this paper, we propose a joint variational modal decomposition (JVMD) algorithm, which is an improved version of VMD by simultaneously implementing modal decomposition on multi-frame signals.
arXiv Detail & Related papers (2024-01-03T02:19:32Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Hybrid Data Augmentation and Deep Attention-based Dilated
Convolutional-Recurrent Neural Networks for Speech Emotion Recognition [1.1086440815804228]
We investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods.
To evaluate the effectiveness of HDA methods, a deep learning framework namely (ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural networks with an attention mechanism.
For validating our proposed methods, we use the EmoDB dataset that consists of several emotions with imbalanced samples.
arXiv Detail & Related papers (2021-09-18T23:13:44Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.