Recognizing Emotions evoked by Movies using Multitask Learning
- URL: http://arxiv.org/abs/2107.14529v1
- Date: Fri, 30 Jul 2021 10:21:40 GMT
- Title: Recognizing Emotions evoked by Movies using Multitask Learning
- Authors: Hassan Hayat, Carles Ventura, Agata Lapedriza
- Abstract summary: Methods for recognizing evoked emotions are usually trained on human annotated data.
We propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture.
Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations.
- Score: 3.4290619267487488
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding the emotional impact of movies has become important for
affective movie analysis, ranking, and indexing. Methods for recognizing evoked
emotions are usually trained on human annotated data. Concretely, viewers watch
video clips and have to manually annotate the emotions they experienced while
watching the videos. Then, the common practice is to aggregate the different
annotations, by computing average scores or majority voting, and train and test
models on these aggregated annotations. With this procedure a single aggregated
evoked emotion annotation is obtained per each video. However, emotions
experienced while watching a video are subjective: different individuals might
experience different emotions. In this paper, we model the emotions evoked by
videos in a different manner: instead of modeling the aggregated value we
jointly model the emotions experienced by each viewer and the aggregated value
using a multi-task learning approach. Concretely, we propose two deep learning
architectures: a Single-Task (ST) architecture and a Multi-Task (MT)
architecture. Our results show that the MT approach can more accurately model
each viewer and the aggregated annotation when compared to methods that are
directly trained on the aggregated annotations. Furthermore, our approach
outperforms the current state-of-the-art results on the COGNIMUSE benchmark.
Related papers
- EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model [22.292581935835678]
We construct a dataset for Emotion Analysis in Long-sequential and De-identity videos called EALD.
We also provide the Non-Facial Body Language (NFBL) annotations for each player.
NFBL is an inner-driven emotional expression and can serve as an identity-free clue to understanding the emotional state.
arXiv Detail & Related papers (2024-05-01T15:25:54Z) - Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios [6.987099464814016]
In Massive Open Online Courses (MOOC), semantic information of instructional videos has a crucial impact on learners' emotional state.
This paper proposes a multimodal emotion recognition method by fusing video semantic information and semantic signals.
The experimental results show that our method has significantly improved emotion recognition performance.
arXiv Detail & Related papers (2024-04-11T05:44:27Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - Direct Classification of Emotional Intensity [4.360819666001918]
We train a model using videos of different people smiling that outputs an intensity score from 0-10.
Our model then employs an adaptive learning technique to improve performance when dealing with new subjects.
arXiv Detail & Related papers (2020-11-15T06:32:48Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.