Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation
- URL: http://arxiv.org/abs/2510.21775v1
- Date: Fri, 17 Oct 2025 09:31:08 GMT
- Title: Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation
- Authors: Dawei Dai, Yinxiu Zhou, Chenghang Li, Guolai Jiang, Chengfang Zhang,
- Abstract summary: Face-MakeUpV2 is a facial image generation model that aims to maintain consistency of face ID and physical characteristics with the reference image.<n>In experiments, Face-MakeUpV2 achieves best overall performance in terms of preserving face ID and maintaining physical consistency of the reference images.
- Score: 4.383815913901858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In facial image generation, current text-to-image models often suffer from facial attribute leakage and insufficient physical consistency when responding to local semantic instructions. In this study, we propose Face-MakeUpV2, a facial image generation model that aims to maintain the consistency of face ID and physical characteristics with the reference image. First, we constructed a large-scale dataset FaceCaptionMask-1M comprising approximately one million image-text-masks pairs that provide precise spatial supervision for the local semantic instructions. Second, we employed a general text-to-image pretrained model as the backbone and introduced two complementary facial information injection channels: a 3D facial rendering channel to incorporate the physical characteristics of the image and a global facial feature channel. Third, we formulated two optimization objectives for the supervised learning of our model: semantic alignment in the model's embedding space to mitigate the attribute leakage problem and perceptual loss on facial images to preserve ID consistency. Extensive experiments demonstrated that our Face-MakeUpV2 achieves best overall performance in terms of preserving face ID and maintaining physical consistency of the reference images. These results highlight the practical potential of Face-MakeUpV2 for reliable and controllable facial editing in diverse applications.
Related papers
- GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data [53.92883885331805]
We present a generative pre-training model for facial knowledge learning that leverages large-scale web-built data for training.<n>Our approach is also applicable to a wide range of face editing tasks, including face attribute editing, expression manipulation, mask removal, and photo inpainting.
arXiv Detail & Related papers (2025-10-21T06:55:44Z) - My Emotion on your face: The use of Facial Keypoint Detection to preserve Emotions in Latent Space Editing [40.24695765468971]
We propose an addition to the loss function of a Facial Keypoint Detection model to restrict changes to the facial expressions.<n>Our approach achieves up to 49% reduction in the change of emotion in our experiments.
arXiv Detail & Related papers (2025-05-09T21:10:27Z) - InstaFace: Identity-Preserving Facial Editing with Single Image Inference [13.067402877443902]
We introduce a novel diffusion-based framework, InstaFace, to generate realistic images while preserving identity using only a single image.<n>InstaFace harnesses 3D perspectives by integrating multiple 3DMM-based conditionals without introducing additional trainable parameters.<n>Our method outperforms several state-of-the-art approaches in terms of identity preservation, photorealism, and effective control of pose, expression, and lighting.
arXiv Detail & Related papers (2025-02-27T22:37:09Z) - Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation [0.0]
We build a dataset of 4 million high-quality face image-text pairs (FaceCaptionHQ-4M) based on LAION-Face.<n>We extract/learn multi-scale content features and pose features for the facial image, integrating these into the diffusion model to enhance the preservation of facial identity features for diffusion models.
arXiv Detail & Related papers (2025-01-05T12:46:31Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.<n>We propose OSDFace, a novel one-step diffusion model for face restoration.<n>Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Face Anonymization Made Simple [44.24233169815565]
Current face anonymization techniques often depend on identity loss calculated by face recognition models, which can be inaccurate and unreliable.
In contrast, our approach uses diffusion models with only a reconstruction loss, eliminating the need for facial landmarks or masks.
Our model achieves state-of-the-art performance in three key areas: identity anonymization, facial preservation, and image quality.
arXiv Detail & Related papers (2024-11-01T17:45:21Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.