Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality
- URL: http://arxiv.org/abs/2103.06541v1
- Date: Thu, 11 Mar 2021 09:07:25 GMT
- Title: Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality
- Authors: Trisha Mittal, Puneet Mathur, Aniket Bera, Dinesh Manocha
- Abstract summary: We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
- Score: 84.69595956853908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Affect2MM, a learning method for time-series emotion prediction
for multimedia content. Our goal is to automatically capture the varying
emotions depicted by characters in real-life human-centric situations and
behaviors. We use the ideas from emotion causation theories to computationally
model and determine the emotional state evoked in clips of movies. Affect2MM
explicitly models the temporal causality using attention-based methods and
Granger causality. We use a variety of components like facial features of
actors involved, scene understanding, visual aesthetics, action/situation
description, and movie script to obtain an affective-rich representation to
understand and perceive the scene. We use an LSTM-based learning model for
emotion perception. To evaluate our method, we analyze and compare our
performance on three datasets, SENDv1, MovieGraphs, and the LIRIS-ACCEDE
dataset, and observe an average of 10-15% increase in the performance over SOTA
methods for all three datasets.
Related papers
- UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception [8.54013419046987]
We introduce UniEmoX, a cross-modal semantic-guided large-scale pretraining framework for visual emotion analysis.
By exploiting the similarity between paired and unpaired image-text samples, UniEmoX distills rich semantic knowledge from the CLIP model to enhance emotional embedding representations.
We develop a visual emotional dataset titled Emo8, covering nearly all common emotional scenes.
arXiv Detail & Related papers (2024-09-27T16:12:51Z) - ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE [0.0]
We argue that emotions and non-determinism are crucial to generate diverse and emotionally-rich facial animations.
We propose ProbTalk3D a non-deterministic neural network approach for emotion controllable speech-driven 3D facial animation synthesis.
arXiv Detail & Related papers (2024-09-12T11:53:05Z) - Affective Behaviour Analysis via Integrating Multi-Modal Knowledge [24.74463315135503]
The 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets.
We present our method designs for the five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation.
arXiv Detail & Related papers (2024-03-16T06:26:43Z) - Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion [87.18073195745914]
We investigate how well human-annotated emotion triggers correlate with features deemed salient in their prediction of emotions.
Using EmoTrigger, we evaluate the ability of large language models to identify emotion triggers.
Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.
arXiv Detail & Related papers (2023-11-16T06:20:13Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Recognizing Emotions evoked by Movies using Multitask Learning [3.4290619267487488]
Methods for recognizing evoked emotions are usually trained on human annotated data.
We propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture.
Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations.
arXiv Detail & Related papers (2021-07-30T10:21:40Z) - Affective Image Content Analysis: Two Decades Review and New
Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades.
We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence.
We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z) - MERLOT: Multimodal Neural Script Knowledge Models [74.05631672657452]
We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech.
MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets.
On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%.
arXiv Detail & Related papers (2021-06-04T17:57:39Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Context Based Emotion Recognition using EMOTIC Dataset [22.631542327834595]
We present EMOTIC, a dataset of images of people annotated with their apparent emotion.
Using the EMOTIC dataset we train different CNN models for emotion recognition.
Our results show how scene context provides important information to automatically recognize emotional states.
arXiv Detail & Related papers (2020-03-30T12:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.