PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
- URL: http://arxiv.org/abs/2309.05793v1
- Date: Mon, 11 Sep 2023 19:59:43 GMT
- Title: PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
- Authors: Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun
Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng
- Abstract summary: PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains.
After a single training phase, our approach enables generating high-quality images within only a few seconds.
- Score: 19.519789922033034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized text-to-image generation has emerged as a powerful and
sought-after tool, empowering users to create customized images based on their
specific concepts and prompts. However, existing approaches to personalization
encounter multiple challenges, including long tuning times, large storage
requirements, the necessity for multiple input images per identity, and
limitations in preserving identity and editability. To address these obstacles,
we present PhotoVerse, an innovative methodology that incorporates a
dual-branch conditioning mechanism in both text and image domains, providing
effective control over the image generation process. Furthermore, we introduce
facial identity loss as a novel component to enhance the preservation of
identity during training. Remarkably, our proposed PhotoVerse eliminates the
need for test time tuning and relies solely on a single facial photo of the
target identity, significantly reducing the resource cost associated with image
generation. After a single training phase, our approach enables generating
high-quality images within only a few seconds. Moreover, our method can produce
diverse images that encompass various scenes and styles. The extensive
evaluation demonstrates the superior performance of our approach, which
achieves the dual objectives of preserving identity and facilitating
editability. Project page: https://photoverse2d.github.io/
Related papers
- Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation [40.969861849933444]
We propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch.
In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images.
In the second stage, multi-source attention swaps the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image.
arXiv Detail & Related papers (2024-07-13T05:28:45Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [66.05234562835136]
We present MuDI, a novel framework that enables multi-subject personalization.
Our main idea is to utilize segmented subjects generated by a foundation model for segmentation.
Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
arXiv Detail & Related papers (2024-04-05T17:45:22Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.