Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation
- URL: http://arxiv.org/abs/2512.19479v1
- Date: Mon, 22 Dec 2025 15:32:18 GMT
- Title: Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation
- Authors: Guoli Jia, Junyao Hu, Xinwei Long, Kai Tian, Kaiyan Zhang, KaiKai Zhao, Ning Ding, Bowen Zhou,
- Abstract summary: Emotion-Director is a cross-modal collaboration framework consisting of two modules.<n>We propose a cross-Modal Collaborative diffusion model, abbreviated as MC-Diffusion.<n>We also propose MC-Agent, a cross-Modal Collaborative Agent system that rewrites textual prompts to express the intended emotions.
- Score: 23.10502994564729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image generation based on diffusion models has demonstrated impressive capability, motivating exploration into diverse and specialized applications. Owing to the importance of emotion in advertising, emotion-oriented image generation has attracted increasing attention. However, current emotion-oriented methods suffer from an affective shortcut, where emotions are approximated to semantics. As evidenced by two decades of research, emotion is not equivalent to semantics. To this end, we propose Emotion-Director, a cross-modal collaboration framework consisting of two modules. First, we propose a cross-Modal Collaborative diffusion model, abbreviated as MC-Diffusion. MC-Diffusion integrates visual prompts with textual prompts for guidance, enabling the generation of emotion-oriented images beyond semantics. Further, we improve the DPO optimization by a negative visual prompt, enhancing the model's sensitivity to different emotions under the same semantics. Second, we propose MC-Agent, a cross-Modal Collaborative Agent system that rewrites textual prompts to express the intended emotions. To avoid template-like rewrites, MC-Agent employs multi-agents to simulate human subjectivity toward emotions, and adopts a chain-of-concept workflow that improves the visual expressiveness of the rewritten prompts. Extensive qualitative and quantitative experiments demonstrate the superiority of Emotion-Director in emotion-oriented image generation.
Related papers
- Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - EmoSEM: Segment and Explain Emotion Stimuli in Visual Art [25.539022846134543]
Given an art image, the model pinpoints pixel regions that trigger a specific human emotion, and generates linguistic explanations for it.<n>This paper proposes the Emotion stimuli and Explanation Model (EmoSEM) model to endow the segmentation framework with emotion comprehension capability.<n>Our method realizes end-to-end modeling from low-level pixel features to high-level emotion interpretation, delivering the first interpretable fine-grained framework for visual emotion analysis.
arXiv Detail & Related papers (2025-04-20T15:40:00Z) - EmoAgent: A Multi-Agent Framework for Diverse Affective Image Manipulation [11.29688638322966]
Affective Image Manipulation aims to alter visual elements within an image to evoke specific emotional responses from viewers.<n>Existing AIM approaches rely on rigid emphone-to-one mappings between emotions and visual cues.<n>We propose emphEmoAgent, the first multi-agent framework tailored specifically for D-AIM.
arXiv Detail & Related papers (2025-03-14T10:55:56Z) - A Unified and Interpretable Emotion Representation and Expression Generation [38.321248253111776]
We propose an interpretable and unified emotion model, referred as C2A2.
We show that our generated images are rich and capture subtle expressions.
arXiv Detail & Related papers (2024-04-01T17:03:29Z) - EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion
Models [11.901294654242376]
We introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories.
Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space.
Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-01-09T15:23:21Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Contrast and Generation Make BART a Good Dialogue Emotion Recognizer [38.18867570050835]
Long-range contextual emotional relationships with speaker dependency play a crucial part in dialogue emotion recognition.
We adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better.
We utilize an auxiliary response generation task to enhance the model's ability of handling context information.
arXiv Detail & Related papers (2021-12-21T13:38:00Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.