EmoCtrl: Controllable Emotional Image Content Generation
- URL: http://arxiv.org/abs/2512.22437v1
- Date: Sat, 27 Dec 2025 02:18:36 GMT
- Title: EmoCtrl: Controllable Emotional Image Content Generation
- Authors: Jingyuan Yang, Weibin Luo, Hui Huang,
- Abstract summary: We introduce Controllable Emotional Image Content Generation (C-EICG)<n>C-EICG aims to generate images that remain faithful to a given content description while expressing a target emotion.<n>EmoCtrl is supported by a dataset annotated with content, emotion, and affective prompts.
- Score: 9.677863079897735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An image conveys meaning through both its visual content and emotional tone, jointly shaping human perception. We introduce Controllable Emotional Image Content Generation (C-EICG), which aims to generate images that remain faithful to a given content description while expressing a target emotion. Existing text-to-image models ensure content consistency but lack emotional awareness, whereas emotion-driven models generate affective results at the cost of content distortion. To address this gap, we propose EmoCtrl, supported by a dataset annotated with content, emotion, and affective prompts, bridging abstract emotions to visual cues. EmoCtrl incorporates textual and visual emotion enhancement modules that enrich affective expression via descriptive semantics and perceptual cues. The learned emotion tokens exhibit complementary effects, as demonstrated through ablations and visualizations. Quantatitive and qualatitive experiments demonstrate that EmoCtrl achieves faithful content and expressive emotion control, outperforming existing methods across multiple aspects. User studies confirm EmoCtrl's strong alignment with human preference. Moreover, EmoCtrl generalizes well to creative applications, further demonstrating the robustness and adaptability of the learned emotion tokens.
Related papers
- EmoKGEdit: Training-free Affective Injection via Visual Cue Transformation [7.245162028678732]
EmoKGEdit is a novel training-free framework for precise and structure-preserving image emotion editing.<n>We construct a Multimodal Sentiment Association Knowledge Graph to disentangle the relationships among objects, scenes, attributes, visual clues and emotion.<n>EmoKGEdit achieves excellent performance in both emotion fidelity and content preservation, and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2026-01-18T09:20:09Z) - EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis [61.87711517626139]
EmoVerse is a large-scale open-source dataset that enables interpretable visual emotion analysis.<n>With over 219k images, the dataset further includes dual annotations in Categorical Emotion States (CES) and Dimensional Emotion Space (DES)
arXiv Detail & Related papers (2025-11-16T11:16:50Z) - Affective Image Editing: Shaping Emotional Factors via Text Descriptions [46.13506671212571]
We introduce AIEdiT for Affective Image Editing using Text descriptions.<n>We build the continuous emotional spectrum and extract nuanced emotional requests.<n>AIEdiT achieves superior performance, effectively reflecting users' emotional requests.
arXiv Detail & Related papers (2025-05-24T13:46:57Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - EmoSEM: Segment and Explain Emotion Stimuli in Visual Art [25.539022846134543]
Given an art image, the model pinpoints pixel regions that trigger a specific human emotion, and generates linguistic explanations for it.<n>This paper proposes the Emotion stimuli and Explanation Model (EmoSEM) model to endow the segmentation framework with emotion comprehension capability.<n>Our method realizes end-to-end modeling from low-level pixel features to high-level emotion interpretation, delivering the first interpretable fine-grained framework for visual emotion analysis.
arXiv Detail & Related papers (2025-04-20T15:40:00Z) - EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model [23.26111054485357]
We introduce the new task of continuous emotional image content generation (C-EICG)<n>We present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values.
arXiv Detail & Related papers (2025-01-10T04:41:37Z) - EmoEdit: Evoking Emotions through Image Manipulation [62.416345095776656]
Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses.<n>We introduce EmoEdit, which extends AIM by incorporating content modifications to enhance emotional impact.<n>Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques.
arXiv Detail & Related papers (2024-05-21T10:18:45Z) - EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion
Models [11.901294654242376]
We introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories.
Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space.
Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-01-09T15:23:21Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.