Visually grounded emotion regulation via diffusion models and user-driven reappraisal
- URL: http://arxiv.org/abs/2507.10861v1
- Date: Mon, 14 Jul 2025 23:28:59 GMT
- Title: Visually grounded emotion regulation via diffusion models and user-driven reappraisal
- Authors: Edoardo Pinzuti, Oliver Tüscher, André Ferreira Castro,
- Abstract summary: We propose a novel, visually based augmentation of cognitive reappraisal by integrating large-scale text-to-image diffusion models into the emotional regulation process.<n>Specifically, we introduce a system in which users reinterpret emotionally negative images via spoken reappraisals.<n>This generative transformation visually instantiates users' reappraisals while maintaining structural similarity to the original stimuli, externalizing and reinforcing regulatory intent.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Cognitive reappraisal is a key strategy in emotion regulation, involving reinterpretation of emotionally charged stimuli to alter affective responses. Despite its central role in clinical and cognitive science, real-world reappraisal interventions remain cognitively demanding, abstract, and primarily verbal. This reliance on higher-order cognitive and linguistic processes is often impaired in individuals with trauma or depression, limiting the effectiveness of standard approaches. Here, we propose a novel, visually based augmentation of cognitive reappraisal by integrating large-scale text-to-image diffusion models into the emotional regulation process. Specifically, we introduce a system in which users reinterpret emotionally negative images via spoken reappraisals, which are transformed into supportive, emotionally congruent visualizations using stable diffusion models with a fine-tuned IP-adapter. This generative transformation visually instantiates users' reappraisals while maintaining structural similarity to the original stimuli, externalizing and reinforcing regulatory intent. To test this approach, we conducted a within-subject experiment (N = 20) using a modified cognitive emotion regulation (CER) task. Participants reappraised or described aversive images from the International Affective Picture System (IAPS), with or without AI-generated visual feedback. Results show that AI-assisted reappraisal significantly reduced negative affect compared to both non-AI and control conditions. Further analyses reveal that sentiment alignment between participant reappraisals and generated images correlates with affective relief, suggesting that multimodal coherence enhances regulatory efficacy. These findings demonstrate that generative visual input can support cogitive reappraisal and open new directions at the intersection of generative AI, affective computing, and therapeutic technology.
Related papers
- R-CAGE: A Structural Model for Emotion Output Design in Human-AI Interaction [2.1828601975620257]
R-CAGE is a theoretical framework for restructuring emotional output in long-term human-AI interaction.<n>By structurally regulating emotional rhythm, sensory intensity, and interpretive affordances, R-CAGE frames emotion not as performative output but as sustainable design unit.
arXiv Detail & Related papers (2025-05-11T15:30:23Z) - A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs [3.2228025627337864]
This paper introduces a structured evaluation framework to dissect the perception-reasoning interface in Vision-Language Models (VLMs)<n>We propose three distinct evaluation paradigms, mirroring human problem-solving strategies.<n>Applying this framework, we demonstrate that CA, leveraging powerful language models for reasoning over rich, independently generated descriptions, achieves new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-01-23T12:42:42Z) - Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement [40.65885791860718]
We propose three regressor-guided image editing approaches aimed at diminishing the emotional impact of images.<n>Our findings demonstrate that approaches can effectively alter the emotional properties of images while maintaining high visual quality.<n>Results from a behavioral study reveal that only the diffusion-based approach successfully elicits changes in viewers' emotional responses.
arXiv Detail & Related papers (2025-01-21T16:59:13Z) - Functional Graph Contrastive Learning of Hyperscanning EEG Reveals
Emotional Contagion Evoked by Stereotype-Based Stressors [1.8925617030516924]
This study focuses on the context of stereotype-based stress (SBS) during collaborative problem-solving tasks among female pairs.
Through an exploration of emotional contagion, this study seeks to unveil its underlying mechanisms and effects.
arXiv Detail & Related papers (2023-08-22T09:04:14Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Affect-DML: Context-Aware One-Shot Recognition of Human Affect using
Deep Metric Learning [29.262204241732565]
Existing methods assume that all emotions-of-interest are given a priori as annotated training examples.
We conceptualize one-shot recognition of emotions in context -- a new problem aimed at recognizing human affect states in finer particle level from a single support sample.
All variants of our model clearly outperform the random baseline, while leveraging the semantic scene context consistently improves the learnt representations.
arXiv Detail & Related papers (2021-11-30T10:35:20Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.