Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
- URL: http://arxiv.org/abs/2501.02523v1
- Date: Sun, 05 Jan 2025 12:46:31 GMT
- Title: Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
- Authors: Dawei Dai, Mingming Jia, Yinxiu Zhou, Hang Xing, Chenghang Li,
- Abstract summary: We build a dataset of 4 million high-quality face image-text pairs (FaceCaptionHQ-4M) based on LAION-Face.
We extract/learn multi-scale content features and pose features for the facial image, integrating these into the diffusion model to enhance the preservation of facial identity features for diffusion models.
- Score: 0.0
- License:
- Abstract: Facial images have extensive practical applications. Although the current large-scale text-image diffusion models exhibit strong generation capabilities, it is challenging to generate the desired facial images using only text prompt. Image prompts are a logical choice. However, current methods of this type generally focus on general domain. In this paper, we aim to optimize image makeup techniques to generate the desired facial images. Specifically, (1) we built a dataset of 4 million high-quality face image-text pairs (FaceCaptionHQ-4M) based on LAION-Face to train our Face-MakeUp model; (2) to maintain consistency with the reference facial image, we extract/learn multi-scale content features and pose features for the facial image, integrating these into the diffusion model to enhance the preservation of facial identity features for diffusion models. Validation on two face-related test datasets demonstrates that our Face-MakeUp can achieve the best comprehensive performance.All codes are available at:https://github.com/ddw2AIGROUP2CQUPT/Face-MakeUp
Related papers
- DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors [24.721887093958284]
Face swapping transfers the identity of a source face to a target face while retaining the attributes like expression, pose, hair, and background of the target face.
We propose DynamicFace that leverages the power of diffusion model and plug-and-play temporal layers for video face swapping.
Our method achieves state-of-the-art results in face swapping, showcasing superior image quality, identity preservation, and expression accuracy.
arXiv Detail & Related papers (2025-01-15T03:28:14Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - 15M Multimodal Facial Image-Text Dataset [5.552727861734425]
FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features.
We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M.
arXiv Detail & Related papers (2024-07-11T14:00:14Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model.
It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z) - Face Swap via Diffusion Model [4.026688121914668]
This report presents a diffusion model based framework for face swapping between two portrait images.
The basic framework consists of three components, for face feature encoding, multi-conditional generation, and face inpainting respectively.
arXiv Detail & Related papers (2024-03-02T07:02:17Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - FaceChain: A Playground for Human-centric Artificial Intelligence
Generated Content [36.48960592782015]
FaceChain is a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models.
We inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions.
Based on FaceChain, we further develop several applications to build a broader playground for better showing its value, including virtual try on and 2D talking head.
arXiv Detail & Related papers (2023-08-28T02:20:44Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - AnyFace: Free-style Text-to-Face Synthesis and Manipulation [41.61972206254537]
This paper proposes the first free-style text-to-face method namely AnyFace.
AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
arXiv Detail & Related papers (2022-03-29T08:27:38Z) - One Shot Face Swapping on Megapixels [65.47443090320955]
This paper proposes the first Megapixel level method for one shot Face Swapping (or MegaFS for short)
Complete face representation, stable training, and limited memory usage are the three novel contributions to the success of our method.
arXiv Detail & Related papers (2021-05-11T10:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.