EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation
- URL: http://arxiv.org/abs/2412.01254v1
- Date: Mon, 02 Dec 2024 08:24:11 GMT
- Title: EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation
- Authors: Liangwei Jiang, Ruida Li, Zhifeng Zhang, Shuo Fang, Chenguang Ma,
- Abstract summary: Existing methods tend to synthesize portraits with either neutral or stereotypical expressions.
EmojiDiff is an end-to-end solution to facilitate simultaneous dual control of fine expression and identity.
- Score: 8.314556078632412
- License:
- Abstract: This paper aims to bring fine-grained expression control to identity-preserving portrait generation. Existing methods tend to synthesize portraits with either neutral or stereotypical expressions. Even when supplemented with control signals like facial landmarks, these models struggle to generate accurate and vivid expressions following user instructions. To solve this, we introduce EmojiDiff, an end-to-end solution to facilitate simultaneous dual control of fine expression and identity. Unlike the conventional methods using coarse control signals, our method directly accepts RGB expression images as input templates to provide extremely accurate and fine-grained expression control in the diffusion process. As its core, an innovative decoupled scheme is proposed to disentangle expression features in the expression template from other extraneous information, such as identity, skin, and style. On one hand, we introduce \textbf{I}D-irrelevant \textbf{D}ata \textbf{I}teration (IDI) to synthesize extremely high-quality cross-identity expression pairs for decoupled training, which is the crucial foundation to filter out identity information hidden in the expressions. On the other hand, we meticulously investigate network layer function and select expression-sensitive layers to inject reference expression features, effectively preventing style leakage from expression signals. To further improve identity fidelity, we propose a novel fine-tuning strategy named \textbf{I}D-enhanced \textbf{C}ontrast \textbf{A}lignment (ICA), which eliminates the negative impact of expression control on original identity preservation. Experimental results demonstrate that our method remarkably outperforms counterparts, achieves precise expression control with highly maintained identity, and generalizes well to various diffusion models.
Related papers
- WEM-GAN: Wavelet transform based facial expression manipulation [2.0918868193463207]
We propose WEM-GAN, in short for wavelet-based expression manipulation GAN.
We take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone.
Our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset.
arXiv Detail & Related papers (2024-12-03T16:23:02Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation [34.72612800373437]
In human-centric content generation, pre-trained text-to-image models struggle to produce user-wanted portrait images.
We propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis.
arXiv Detail & Related papers (2024-01-02T13:28:39Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Disentangling Identity and Pose for Facial Expression Recognition [54.50747989860957]
We propose an identity and pose disentangled facial expression recognition (IPD-FER) model to learn more discriminative feature representation.
For identity encoder, a well pre-trained face recognition model is utilized and fixed during training, which alleviates the restriction on specific expression training data.
By comparing the difference between synthesized neutral and expressional images of the same individual, the expression component is further disentangled from identity and pose.
arXiv Detail & Related papers (2022-08-17T06:48:13Z) - Mutual Information Regularized Identity-aware Facial
ExpressionRecognition in Compressed Video [27.602648102881535]
We propose a novel collaborative min-min game for mutual information (MI) minimization in latent space.
We do not need the identity label or multiple expression samples from the same person for identity elimination.
Our solution can achieve comparable or better performance than the recent decoded image-based methods.
arXiv Detail & Related papers (2020-10-20T21:42:18Z) - LEED: Label-Free Expression Editing via Disentanglement [57.09545215087179]
LEED framework is capable of editing the expression of both frontal and profile facial images without requiring any expression label.
Two novel losses are designed for optimal expression disentanglement and consistent synthesis.
arXiv Detail & Related papers (2020-07-17T13:36:15Z) - Fine-Grained Expression Manipulation via Structured Latent Space [30.789513209376032]
We propose an end-to-end expression-guided generative adversarial network (EGGAN) to manipulate fine-grained expressions.
Our method can manipulate fine-grained expressions, and generate continuous intermediate expressions between source and target expressions.
arXiv Detail & Related papers (2020-04-21T06:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.