Related papers: EmoEdit: Evoking Emotions through Image Manipulation

EmoEdit: Evoking Emotions through Image Manipulation

URL: http://arxiv.org/abs/2405.12661v1
Date: Tue, 21 May 2024 10:18:45 GMT
Title: EmoEdit: Evoking Emotions through Image Manipulation
Authors: Jingyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang,
Abstract summary: We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing. In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions. In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications.
Score: 62.416345095776656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psychological insights, we extend AIM by incorporating content modifications to enhance emotional impact. We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing. In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions. In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications. A ranking technique that we developed selects the best edit, balancing between emotion fidelity and structure integrity. To validate EmoEdit, we assembled a dataset of 416 images, categorized into positive, negative, and neutral classes. Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques. Additionally, we showcase EmoEdit's potential in various manipulation tasks, including emotion-oriented and semantics-oriented editing.

Related papers

Moodifier: MLLM-Enhanced Emotion-Driven Image Editing [0.9208007322096533]
We introduce MoodArchive, an 8M+ image dataset with detailed hierarchical emotional annotations generated by LLaVA.<n>Second, we develop MoodifyCLIP, a vision-language model fine-tuned on MoodArchive to translate abstract emotions into specific visual attributes.<n>Third, we propose Moodifier, a training-free editing model leveraging MoodifyCLIP and multimodal large language models (MLLMs) to enable precise emotional transformations.
arXiv Detail & Related papers (2025-07-18T15:52:39Z)
Affective Image Editing: Shaping Emotional Factors via Text Descriptions [46.13506671212571]
We introduce AIEdiT for Affective Image Editing using Text descriptions.<n>We build the continuous emotional spectrum and extract nuanced emotional requests.<n>AIEdiT achieves superior performance, effectively reflecting users' emotional requests.
arXiv Detail & Related papers (2025-05-24T13:46:57Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics. We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention. Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks. Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
EmoSEM: Segment and Explain Emotion Stimuli in Visual Art [25.539022846134543]
This paper focuses on a key challenge in visual art understanding: given an art image, the model pinpoints pixel regions that trigger a specific human emotion. Despite recent advances in art understanding, pixel-level emotion understanding still faces a dual challenge. This paper proposes the Emotion stimuli and Explanation Model (EmoSEM) to endow the segmentation model SAM with emotion comprehension capability.
arXiv Detail & Related papers (2025-04-20T15:40:00Z)
EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation [11.29688638322966]
Affective Image Manipulation (AIM) aims to alter an image's emotional impact by adjusting multiple visual elements to evoke specific feelings. We introduce EmoAgent, the first multi-agent collaboration framework for AIM. We develop an emotion-factor knowledge retriever, a decision-making tree space, and a tool library to enhance EmoAgent's effectiveness.
arXiv Detail & Related papers (2025-03-14T10:55:56Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks. But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z)
Make Me Happier: Evoking Emotions Through Image Diffusion Models [36.40067582639123]
We present a novel challenge of emotion-evoked image generation, aiming to synthesize images that evoke target emotions while retaining the semantics and structures of the original scenes. Due to the lack of emotion editing datasets, we provide a unique dataset consisting of 340,000 pairs of images and their emotion annotations.
arXiv Detail & Related papers (2024-03-13T05:13:17Z)
EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model [39.14430238946951]
EmoTalker is an emotionally editable portraits animation approach based on the diffusion model. Emotion Intensity Block is introduced to analyze fine-grained emotions and strengths derived from prompts. Experiments show the effectiveness of EmoTalker in generating high-quality, emotionally customizable facial expressions.
arXiv Detail & Related papers (2024-01-16T02:02:44Z)
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models [11.901294654242376]
We introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space. Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-01-09T15:23:21Z)
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity. Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z)
EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes [53.95428298229396]
We introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes. EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators. Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes.
arXiv Detail & Related papers (2023-07-16T06:42:46Z)
SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z)
Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction. To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network. Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.