A Comparative Study of Data Augmentation Techniques for Deep Learning
Based Emotion Recognition
- URL: http://arxiv.org/abs/2211.05047v1
- Date: Wed, 9 Nov 2022 17:27:03 GMT
- Title: A Comparative Study of Data Augmentation Techniques for Deep Learning
Based Emotion Recognition
- Authors: Ravi Shankar, Abdouh Harouna Kenfack, Arjun Somayazulu, Archana
Venkataraman
- Abstract summary: We conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition.
We show that long-range dependencies in the speech signal are critical for emotion recognition.
Speed/rate augmentation offers the most robust performance gain across models.
- Score: 11.928873764689458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated emotion recognition in speech is a long-standing problem. While
early work on emotion recognition relied on hand-crafted features and simple
classifiers, the field has now embraced end-to-end feature learning and
classification using deep neural networks. In parallel to these models,
researchers have proposed several data augmentation techniques to increase the
size and variability of existing labeled datasets. Despite many seminal
contributions in the field, we still have a poor understanding of the interplay
between the network architecture and the choice of data augmentation. Moreover,
only a handful of studies demonstrate the generalizability of a particular
model across multiple datasets, which is a prerequisite for robust real-world
performance. In this paper, we conduct a comprehensive evaluation of popular
deep learning approaches for emotion recognition. To eliminate bias, we fix the
model architectures and optimization hyperparameters using the VESUS dataset
and then use repeated 5-fold cross validation to evaluate the performance on
the IEMOCAP and CREMA-D datasets. Our results demonstrate that long-range
dependencies in the speech signal are critical for emotion recognition and that
speed/rate augmentation offers the most robust performance gain across models.
Related papers
- A Hybrid End-to-End Spatio-Temporal Attention Neural Network with
Graph-Smooth Signals for EEG Emotion Recognition [1.6328866317851187]
We introduce a deep neural network that acquires interpretable representations by a hybrid structure of network-temporal encoding and recurrent attention blocks.
We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset.
arXiv Detail & Related papers (2023-07-06T15:35:14Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Machine Learning For Classification Of Antithetical Emotional States [1.1602089225841632]
This works analyses the baseline machine learning classifiers' performance on DEAP dataset.
It provides state-of-the-art comparable results leveraging the performance boost due to its deep learning architecture.
arXiv Detail & Related papers (2022-09-06T06:54:33Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Facial Emotion Recognition using Deep Residual Networks in Real-World
Environments [5.834678345946704]
We propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset.
The dataset consists of a million labelled frames and 2,616 thousand subjects.
As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data.
arXiv Detail & Related papers (2021-11-04T10:08:22Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.