EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion
Models
- URL: http://arxiv.org/abs/2401.04608v1
- Date: Tue, 9 Jan 2024 15:23:21 GMT
- Title: EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion
Models
- Authors: Jingyuan Yang, Jiawei Feng, Hui Huang
- Abstract summary: We introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories.
Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space.
Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively.
- Score: 11.901294654242376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed remarkable progress in image generation task,
where users can create visually astonishing images with high-quality. However,
existing text-to-image diffusion models are proficient in generating concrete
concepts (dogs) but encounter challenges with more abstract ones (emotions).
Several efforts have been made to modify image emotions with color and style
adjustments, facing limitations in effectively conveying emotions with fixed
image contents. In this work, we introduce Emotional Image Content Generation
(EICG), a new task to generate semantic-clear and emotion-faithful images given
emotion categories. Specifically, we propose an emotion space and construct a
mapping network to align it with the powerful Contrastive Language-Image
Pre-training (CLIP) space, providing a concrete interpretation of abstract
emotions. Attribute loss and emotion confidence are further proposed to ensure
the semantic diversity and emotion fidelity of the generated images. Our method
outperforms the state-of-the-art text-to-image approaches both quantitatively
and qualitatively, where we derive three custom metrics, i.e., emotion
accuracy, semantic clarity and semantic diversity. In addition to generation,
our method can help emotion understanding and inspire emotional art design.
Related papers
- EmoEdit: Evoking Emotions through Image Manipulation [62.416345095776656]
We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing.
In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions.
In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications.
arXiv Detail & Related papers (2024-05-21T10:18:45Z) - Make Me Happier: Evoking Emotions Through Image Diffusion Models [36.40067582639123]
We present a novel challenge of emotion-evoked image generation, aiming to synthesize images that evoke target emotions while retaining the semantics and structures of the original scenes.
Due to the lack of emotion editing datasets, we provide a unique dataset consisting of 340,000 pairs of images and their emotion annotations.
arXiv Detail & Related papers (2024-03-13T05:13:17Z) - EmoTalker: Emotionally Editable Talking Face Generation via Diffusion
Model [39.14430238946951]
EmoTalker is an emotionally editable portraits animation approach based on the diffusion model.
Emotion Intensity Block is introduced to analyze fine-grained emotions and strengths derived from prompts.
Experiments show the effectiveness of EmoTalker in generating high-quality, emotionally customizable facial expressions.
arXiv Detail & Related papers (2024-01-16T02:02:44Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes [53.95428298229396]
We introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes.
EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators.
Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes.
arXiv Detail & Related papers (2023-07-16T06:42:46Z) - High-Level Context Representation for Emotion Recognition in Images [4.987022981158291]
We propose an approach for high-level context representation extraction from images.
The model relies on a single cue and a single encoding stream to correlate this representation with emotions.
Our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
arXiv Detail & Related papers (2023-05-05T13:20:41Z) - High-fidelity Generalized Emotional Talking Face Generation with
Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation.
Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space.
An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z) - ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer [59.05857591535986]
We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs"
We present experimental results of both manual and automatic evaluations.
arXiv Detail & Related papers (2022-02-15T10:53:08Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.