Hybrid Data Augmentation and Deep Attention-based Dilated
Convolutional-Recurrent Neural Networks for Speech Emotion Recognition
- URL: http://arxiv.org/abs/2109.09026v1
- Date: Sat, 18 Sep 2021 23:13:44 GMT
- Title: Hybrid Data Augmentation and Deep Attention-based Dilated
Convolutional-Recurrent Neural Networks for Speech Emotion Recognition
- Authors: Nhat Truong Pham, Duc Ngoc Minh Dang, Sy Dzung Nguyen
- Abstract summary: We investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods.
To evaluate the effectiveness of HDA methods, a deep learning framework namely (ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural networks with an attention mechanism.
For validating our proposed methods, we use the EmoDB dataset that consists of several emotions with imbalanced samples.
- Score: 1.1086440815804228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech emotion recognition (SER) has been one of the significant tasks in
Human-Computer Interaction (HCI) applications. However, it is hard to choose
the optimal features and deal with imbalance labeled data. In this article, we
investigate hybrid data augmentation (HDA) methods to generate and balance data
based on traditional and generative adversarial networks (GAN) methods. To
evaluate the effectiveness of HDA methods, a deep learning framework namely
(ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural
networks with an attention mechanism. Besides, we choose 3D log Mel-spectrogram
(MelSpec) features as the inputs for the deep learning framework. Furthermore,
we reconfigure a loss function by combining a softmax loss and a center loss to
classify the emotions. For validating our proposed methods, we use the EmoDB
dataset that consists of several emotions with imbalanced samples. Experimental
results prove that the proposed methods achieve better accuracy than the
state-of-the-art methods on the EmoDB with 87.12% and 88.47% for the
traditional and GAN-based methods, respectively.
Related papers
- Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition [0.5985204759362747]
We propose a novel and effective feature fusion mechanism named Mutual-Cross-Attention (MCA)
MCA discovers the complementary relationship between time-domain and frequency-domain features in EEG data.
The proposed method eventually achieves 99.49% (valence) and 99.30% (arousal) accuracy on DEAP dataset.
arXiv Detail & Related papers (2024-06-20T06:08:52Z) - CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images [2.2265536092123006]
We propose the 3D cerebrovascular attention UNet method, named CV-AttentionUNet, for precise extraction of brain vessel images.
To combine the low and high semantics, we applied the attention mechanism.
We believe that the novelty of this algorithm lies in its ability to perform well on both labeled and unlabeled data.
arXiv Detail & Related papers (2023-11-16T22:31:05Z) - Graph Convolutional Network with Connectivity Uncertainty for EEG-based
Emotion Recognition [20.655367200006076]
This study introduces the distribution-based uncertainty method to represent spatial dependencies and temporal-spectral relativeness in EEG signals.
The graph mixup technique is employed to enhance latent connected edges and mitigate noisy label issues.
We evaluate our approach on two widely used datasets, namely SEED and SEEDIV, for emotion recognition tasks.
arXiv Detail & Related papers (2023-10-22T03:47:11Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial
Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment.
Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images.
This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z) - Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis.
The hierarchical transformers in the generator are designed to estimate the noise at multiple scales.
Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - ScalingNet: extracting features from raw EEG data for emotion
recognition [4.047737925426405]
We propose a novel convolutional layer allowing to adaptively extract effective data-driven spectrogram-like features from raw EEG signals.
The proposed neural network architecture based on the scaling layer, references as ScalingNet, has achieved the state-of-the-art result across the established DEAP benchmark dataset.
arXiv Detail & Related papers (2021-02-07T08:54:27Z) - Emotional EEG Classification using Connectivity Features and
Convolutional Neural Networks [81.74442855155843]
We introduce a new classification system that utilizes brain connectivity with a CNN and validate its effectiveness via the emotional video classification.
The level of concentration of the brain connectivity related to the emotional property of the target video is correlated with classification performance.
arXiv Detail & Related papers (2021-01-18T13:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.