Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction
- URL: http://arxiv.org/abs/2412.18149v1
- Date: Tue, 24 Dec 2024 04:05:21 GMT
- Title: Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction
- Authors: Xiao Guo, Manh Tran, Jiaxin Cheng, Xiaoming Liu,
- Abstract summary: We propose a new T2I personalization diffusion model, Dense-Face, which can generate face images with a consistent identity as the given reference subject.
Our method achieves state-of-the-art or competitive generation performance in image-text alignment, identity preservation, and pose control.
- Score: 12.938413724185388
- License:
- Abstract: The text-to-image (T2I) personalization diffusion model can generate images of the novel concept based on the user input text caption. However, existing T2I personalized methods either require test-time fine-tuning or fail to generate images that align well with the given text caption. In this work, we propose a new T2I personalization diffusion model, Dense-Face, which can generate face images with a consistent identity as the given reference subject and align well with the text caption. Specifically, we introduce a pose-controllable adapter for the high-fidelity image generation while maintaining the text-based editing ability of the pre-trained stable diffusion (SD). Additionally, we use internal features of the SD UNet to predict dense face annotations, enabling the proposed method to gain domain knowledge in face generation. Empirically, our method achieves state-of-the-art or competitive generation performance in image-text alignment, identity preservation, and pose control.
Related papers
- PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control [24.569528214869113]
StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation.
This work uses the disentangled $mathcalW+$ space of StyleGANs to condition the T2I model.
We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
arXiv Detail & Related papers (2024-07-24T07:10:25Z) - MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation [59.13765130528232]
We present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.
Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additional introduced cross attention.
To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model.
arXiv Detail & Related papers (2024-05-09T14:42:16Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - Face2Diffusion for Fast and Editable Face Personalization [33.65484538815936]
We propose Face2Diffusion (F2D) for high-editability face personalization.
The core idea behind F2D is that removing identity-irrelevant information from the training pipeline prevents the overfitting problem.
F2D consists of the following three novel components.
arXiv Detail & Related papers (2024-03-08T06:46:01Z) - Discriminative Probing and Tuning for Text-to-Image Generation [129.39674951747412]
Text-to-image generation (T2I) often faces text-image misalignment problems such as relation confusion in generated images.
We propose bolstering the discriminative abilities of T2I models to achieve more precise text-to-image alignment for generation.
We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.
arXiv Detail & Related papers (2024-03-07T08:37:33Z) - Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation [21.739328335601716]
This paper focuses on inserting accurate and interactive ID embedding into the Stable Diffusion Model for personalized generation.
We propose a face-wise attention loss to fit the face region instead of entangling ID-unrelated information, such as face layout and background.
Our results exhibit superior ID accuracy, text-based manipulation ability, and generalization compared to previous methods.
arXiv Detail & Related papers (2024-01-31T11:52:33Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN.
We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face.
Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.