Related papers: Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli

Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli

URL: http://arxiv.org/abs/2505.19178v1
Date: Sun, 25 May 2025 14:52:36 GMT
Title: Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli
Authors: Akhila Yaragoppa, Siddharth,
Abstract summary: We introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions.<n>Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency -- the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.

Related papers

EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation [9.502538906407098]
We introduce EmoVid, the first multimodal, emotion-annotated video dataset specifically designed for creative media.<n>We uncover spatial and temporal patterns linking visual features to emotional perceptions across diverse video forms.<n>We develop an emotion-conditioned video generation technique by fine-tuning the Wan2.1 model.
arXiv Detail & Related papers (2025-11-14T06:44:21Z)
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models [46.591026037722436]
We propose a novel affective cues-guided reasoning framework that unifies fundamental attribute perception, expression analysis, and high-level emotional understanding.<n>At the core of our approach is a family of video emotion foundation models (VidEmo), specifically designed for emotion reasoning and instruction-following.<n>We establish a foundational data infrastructure and introduce a emotion-centric fine-grained dataset consisting of 2.1M diverse instruction-based samples.
arXiv Detail & Related papers (2025-11-04T16:31:09Z)
Enhancing the Prediction of Emotional Experience in Movies using Deep Neural Networks: The Significance of Audio and Language [0.0]
Our paper focuses on making use of deep neural network models to accurately predict the range of human emotions experienced during watching movies. In this certain setup, there exist three clear-cut input modalities that considerably influence the experienced emotions: visual cues derived from RGB video frames, auditory components encompassing sounds, speech, and music, and linguistic elements encompassing actors' dialogues.
arXiv Detail & Related papers (2023-06-17T17:40:27Z)
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing. The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states. The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z)
Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition [4.610756199751138]
We propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video. We comparethe performance of the proposed attention networks with thestate-of-the-art LSTM models on multi-class classification task ofrecognizing six basic human emotions.
arXiv Detail & Related papers (2022-03-03T22:01:48Z)
SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z)
Affective Image Content Analysis: Two Decades Review and New Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades. We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z)
Audio-Driven Emotional Video Portraits [79.95687903497354]
We present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios. Specifically, we propose the Cross-Reconstructed Emotion Disentanglement technique to decompose speech into two decoupled spaces. With the disentangled features, dynamic 2D emotional facial landmarks can be deduced. Then we propose the Target-Adaptive Face Synthesis technique to generate the final high-quality video portraits.
arXiv Detail & Related papers (2021-04-15T13:37:13Z)
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content. Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z)
Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. Our model achieves state-of-the-art performance on most of the emotion categories. Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
Context Based Emotion Recognition using EMOTIC Dataset [22.631542327834595]
We present EMOTIC, a dataset of images of people annotated with their apparent emotion. Using the EMOTIC dataset we train different CNN models for emotion recognition. Our results show how scene context provides important information to automatically recognize emotional states.
arXiv Detail & Related papers (2020-03-30T12:38:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.