EmoKGEdit: Training-free Affective Injection via Visual Cue Transformation
- URL: http://arxiv.org/abs/2601.12326v1
- Date: Sun, 18 Jan 2026 09:20:09 GMT
- Title: EmoKGEdit: Training-free Affective Injection via Visual Cue Transformation
- Authors: Jing Zhang, Bingjie Fan,
- Abstract summary: EmoKGEdit is a novel training-free framework for precise and structure-preserving image emotion editing.<n>We construct a Multimodal Sentiment Association Knowledge Graph to disentangle the relationships among objects, scenes, attributes, visual clues and emotion.<n>EmoKGEdit achieves excellent performance in both emotion fidelity and content preservation, and outperforms the state-of-the-art methods.
- Score: 7.245162028678732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing image emotion editing methods struggle to disentangle emotional cues from latent content representations, often yielding weak emotional expression and distorted visual structures. To bridge this gap, we propose EmoKGEdit, a novel training-free framework for precise and structure-preserving image emotion editing. Specifically, we construct a Multimodal Sentiment Association Knowledge Graph (MSA-KG) to disentangle the intricate relationships among objects, scenes, attributes, visual clues and emotion. MSA-KG explicitly encode the causal chain among object-attribute-emotion, and as external knowledge to support chain of thought reasoning, guiding the multimodal large model to infer plausible emotion-related visual cues and generate coherent instructions. In addition, based on MSA-KG, we design a disentangled structure-emotion editing module that explicitly separates emotional attributes from layout features within the latent space, which ensures that the target emotion is effectively injected while strictly maintaining visual spatial coherence. Extensive experiments demonstrate that EmoKGEdit achieves excellent performance in both emotion fidelity and content preservation, and outperforms the state-of-the-art methods.
Related papers
- EmoLat: Text-driven Image Sentiment Transfer via Emotion Latent Space [8.453871826832478]
We propose EmoLat, a novel emotion latent space that enables fine-grained, text-driven image sentiment transfer.<n>Within EmoLat, an emotion semantic graph is constructed to capture the relational structure among emotions, objects, and visual attributes.<n>Building upon EmoLat, a cross-modal sentiment transfer framework is proposed to manipulate image sentiment via joint embedding of text and EmoLat features.
arXiv Detail & Related papers (2026-01-17T15:07:36Z) - EmoCtrl: Controllable Emotional Image Content Generation [9.677863079897735]
We introduce Controllable Emotional Image Content Generation (C-EICG)<n>C-EICG aims to generate images that remain faithful to a given content description while expressing a target emotion.<n>EmoCtrl is supported by a dataset annotated with content, emotion, and affective prompts.
arXiv Detail & Related papers (2025-12-27T02:18:36Z) - EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis [61.87711517626139]
EmoVerse is a large-scale open-source dataset that enables interpretable visual emotion analysis.<n>With over 219k images, the dataset further includes dual annotations in Categorical Emotion States (CES) and Dimensional Emotion Space (DES)
arXiv Detail & Related papers (2025-11-16T11:16:50Z) - Incorporating Scene Context and Semantic Labels for Enhanced Group-level Emotion Recognition [39.138182195807424]
Group-level emotion recognition (GER) aims to identify holistic emotions within a scene involving multiple individuals.<n>Current existed methods underestimate the importance of visual scene contextual information in modeling individual relationships.<n>We propose a novel framework that incorporates visual scene context and label-guided semantic information to improve GER performance.
arXiv Detail & Related papers (2025-09-26T01:25:39Z) - EmoCAST: Emotional Talking Portrait via Emotive Text Description [56.42674612728354]
EmoCAST is a diffusion-based framework for precise text-driven emotional synthesis.<n>In appearance modeling, emotional prompts are integrated through a text-guided decoupled emotive module.<n>EmoCAST achieves state-of-the-art performance in generating realistic, emotionally expressive, and audio-synchronized talking-head videos.
arXiv Detail & Related papers (2025-08-28T10:02:06Z) - Moodifier: MLLM-Enhanced Emotion-Driven Image Editing [0.9208007322096533]
We introduce MoodArchive, an 8M+ image dataset with detailed hierarchical emotional annotations generated by LLaVA.<n>Second, we develop MoodifyCLIP, a vision-language model fine-tuned on MoodArchive to translate abstract emotions into specific visual attributes.<n>Third, we propose Moodifier, a training-free editing model leveraging MoodifyCLIP and multimodal large language models (MLLMs) to enable precise emotional transformations.
arXiv Detail & Related papers (2025-07-18T15:52:39Z) - KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval [35.77379981826482]
We propose textbfK-EVERtextsuperscript2, a knowledge-enhanced framework for emotion reasoning and retrieval.<n>Our approach introduces a semantically structured formulation of visual emotion cues and integrates external affective knowledge through multimodal alignment.<n>We validate our framework on three representative benchmarks, Emotion6, EmoSet, and M-Disaster, covering social media imagery, human-centric scenes, and disaster contexts.
arXiv Detail & Related papers (2025-05-30T08:33:32Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - EmoEdit: Evoking Emotions through Image Manipulation [62.416345095776656]
Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses.<n>We introduce EmoEdit, which extends AIM by incorporating content modifications to enhance emotional impact.<n>Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques.
arXiv Detail & Related papers (2024-05-21T10:18:45Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.