EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation
- URL: http://arxiv.org/abs/2503.11290v1
- Date: Fri, 14 Mar 2025 10:55:56 GMT
- Title: EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation
- Authors: Qi Mao, Haobo Hu, Yujie He, Difei Gao, Haokun Chen, Libiao Jin,
- Abstract summary: Affective Image Manipulation (AIM) aims to alter an image's emotional impact by adjusting multiple visual elements to evoke specific feelings.<n>We introduce EmoAgent, the first multi-agent collaboration framework for AIM.<n>We develop an emotion-factor knowledge retriever, a decision-making tree space, and a tool library to enhance EmoAgent's effectiveness.
- Score: 11.29688638322966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Affective Image Manipulation (AIM) aims to alter an image's emotional impact by adjusting multiple visual elements to evoke specific feelings.Effective AIM is inherently complex, necessitating a collaborative approach that involves identifying semantic cues within source images, manipulating these elements to elicit desired emotional responses, and verifying that the combined adjustments successfully evoke the target emotion.To address these challenges, we introduce EmoAgent, the first multi-agent collaboration framework for AIM. By emulating the cognitive behaviors of a human painter, EmoAgent incorporates three specialized agents responsible for planning, editing, and critical evaluation. Furthermore, we develop an emotion-factor knowledge retriever, a decision-making tree space, and a tool library to enhance EmoAgent's effectiveness in handling AIM. Experiments demonstrate that the proposed multi-agent framework outperforms existing methods, offering more reasonable and effective emotional expression.
Related papers
- Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.
We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.
Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.
Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents [18.1558732924808]
We reformulate physical adversarial attacks as a one-shot patch generation problem.<n>Our approach generates adversarial patches through a deep generative model.<n>We propose MAGIC, a novel framework powered by multi-modal LLM agents.
arXiv Detail & Related papers (2024-12-11T01:41:19Z) - Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors [63.194053817609024]
We introduce eye behaviors as an important emotional cues for the creation of a new Eye-behavior-aided Multimodal Emotion Recognition dataset.
For the first time, we provide annotations for both Emotion Recognition (ER) and Facial Expression Recognition (FER) in the EMER dataset.
We specifically design a new EMERT architecture to concurrently enhance performance in both ER and FER.
arXiv Detail & Related papers (2024-11-08T04:53:55Z) - Emotion-Agent: Unsupervised Deep Reinforcement Learning with Distribution-Prototype Reward for Continuous Emotional EEG Analysis [2.1645626994550664]
Continuous electroencephalography (EEG) signals are widely used in affective brain-computer interface (aBCI) applications.
We propose a novel unsupervised deep reinforcement learning framework, called Emotion-Agent, to automatically identify relevant and informative emotional moments from EEG signals.
Emotion-Agent is trained using Proximal Policy Optimization (PPO) to achieve stable and efficient convergence.
arXiv Detail & Related papers (2024-08-22T04:29:25Z) - Infusing Emotions into Task-oriented Dialogue Systems: Understanding, Management, and Generation [6.377334634656281]
Emotions are indispensable in human communication, but are often overlooked in task-oriented dialogue (ToD) modelling.
In this work, we incorporate emotion into the complete ToD processing loop, involving understanding, management, and generation.
We demonstrate that our proposed framework significantly enhances the user's emotional experience as well as the task success.
arXiv Detail & Related papers (2024-08-05T12:21:04Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - EmoEdit: Evoking Emotions through Image Manipulation [62.416345095776656]
Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses.
We introduce EmoEdit, which extends AIM by incorporating content modifications to enhance emotional impact.
Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques.
arXiv Detail & Related papers (2024-05-21T10:18:45Z) - UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause [18.99103120856208]
We propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause.
UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template.
Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks.
arXiv Detail & Related papers (2024-03-30T15:59:17Z) - Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought [50.13429055093534]
Large Language Models (LLMs) have shown remarkable performance in various emotion recognition tasks.
We propose the Emotional Chain-of-Thought (ECoT) to enhance the performance of LLMs on various emotional generation tasks.
arXiv Detail & Related papers (2024-01-12T16:42:10Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Multi-Agent Embodied Visual Semantic Navigation with Scene Prior
Knowledge [42.37872230561632]
In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given.
Most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when completing more complicated tasks.
We propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects.
arXiv Detail & Related papers (2021-09-20T13:31:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.