Related papers: Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

URL: http://arxiv.org/abs/2401.01207v2
Date: Sun, 7 Apr 2024 03:44:59 GMT
Title: Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Authors: Renshuai Liu, Bowen Ma, Wei Zhang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Xuan Cheng,
Abstract summary: In human-centric content generation, pre-trained text-to-image models struggle to produce user-wanted portrait images. We propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis.
Score: 34.72612800373437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model that can undertake the task of simultaneously face swapping and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in the conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling, and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swapping, and face reenactment methods.

Related papers

ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion [40.50436862878818]
We present a diffusion-based framework that faithfully reimagines any subject under any particular facial expression.<n>Our adapter generalizes beyond basic emotions to subtle micro-expressions and expressive transitions, overlooked by prior works.<n>In addition, a pluggable Reference Adapter enables expression editing in real images by transferring the appearance from a reference frame during synthesis.
arXiv Detail & Related papers (2025-10-06T11:20:56Z)
Gen-AFFECT: Generation of Avatar Fine-grained Facial Expressions with Consistent identiTy [15.26953477181137]
GEN-AFFECT is a novel framework for personalized avatar generation.<n>It generates expressive and identity-consistent avatars with a diverse set of facial expressions.
arXiv Detail & Related papers (2025-08-13T03:35:35Z)
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation [8.314556078632412]
We introduce EmojiDiff, the first end-to-end solution that enables simultaneous control of extremely detailed expression (RGB-level) and high-fidelity identity in portrait generation. For decoupled training, we innovate ID-irrelevant Data Iteration (IDI) to synthesize cross-identity expression pairs. We also present ID-enhanced Contrast Alignment (ICA) for further fine-tuning.
arXiv Detail & Related papers (2024-12-02T08:24:11Z)
Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity. We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation. PortraitBooth eliminates computational overhead and mitigates identity distortion. It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression. We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z)
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces. We also propose self-augmented editability learning to enhance the editability of models. Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z)
VariTex: Variational Neural Face Textures [0.0]
VariTex is a method that learns a variational latent feature space of neural face textures. To generate images of complete human heads, we propose an additive decoder that generates plausible additional details such as hair. The resulting method can generate geometrically consistent images of novel identities allowing fine-grained control over head pose, face shape, and facial expressions.
arXiv Detail & Related papers (2021-04-13T07:47:53Z)
LEED: Label-Free Expression Editing via Disentanglement [57.09545215087179]
LEED framework is capable of editing the expression of both frontal and profile facial images without requiring any expression label. Two novel losses are designed for optimal expression disentanglement and consistent synthesis.
arXiv Detail & Related papers (2020-07-17T13:36:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.