Temporal aggregation of audio-visual modalities for emotion recognition
- URL: http://arxiv.org/abs/2007.04364v1
- Date: Wed, 8 Jul 2020 18:44:15 GMT
- Title: Temporal aggregation of audio-visual modalities for emotion recognition
- Authors: Andreea Birhala, Catalin Nicolae Ristea, Anamaria Radoi, Liviu
Cristian Dutu
- Abstract summary: We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality.
Our proposed method outperforms other methods from the literature and human accuracy rating.
- Score: 0.5352699766206808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition has a pivotal role in affective computing and in
human-computer interaction. The current technological developments lead to
increased possibilities of collecting data about the emotional state of a
person. In general, human perception regarding the emotion transmitted by a
subject is based on vocal and visual information collected in the first seconds
of interaction with the subject. As a consequence, the integration of verbal
(i.e., speech) and non-verbal (i.e., image) information seems to be the
preferred choice in most of the current approaches towards emotion recognition.
In this paper, we propose a multimodal fusion technique for emotion recognition
based on combining audio-visual modalities from a temporal window with
different temporal offsets for each modality. We show that our proposed method
outperforms other methods from the literature and human accuracy rating. The
experiments are conducted over the open-access multimodal dataset CREMA-D.
Related papers
- Emotion Recognition from the perspective of Activity Recognition [0.0]
Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions.
For emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world.
We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism.
arXiv Detail & Related papers (2024-03-24T18:53:57Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Multi-Cue Adaptive Emotion Recognition Network [4.570705738465714]
We propose a new deep learning approach for emotion recognition based on adaptive multi-cues.
We compare the proposed approach with the state-of-art approaches in the CAER-S dataset.
arXiv Detail & Related papers (2021-11-03T15:08:55Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction.
To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network.
Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Leveraging Recent Advances in Deep Learning for Audio-Visual Emotion
Recognition [2.1485350418225244]
Spontaneous multi-modal emotion recognition has been extensively studied for human behavior analysis.
We propose a new deep learning-based approach for audio-visual emotion recognition.
arXiv Detail & Related papers (2021-03-16T15:49:15Z) - Emotion Recognition From Gait Analyses: Current Research and Future
Directions [48.93172413752614]
gait conveys information about the walker's emotion.
The mapping between various emotions and gait patterns provides a new source for automated emotion recognition.
gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject.
arXiv Detail & Related papers (2020-03-13T08:22:33Z) - Emotion Recognition System from Speech and Visual Information based on
Convolutional Neural Networks [6.676572642463495]
We propose a system that is able to recognize emotions with a high accuracy rate and in real time.
In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources.
arXiv Detail & Related papers (2020-02-29T22:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.