FaceStudio: Put Your Face Everywhere in Seconds
- URL: http://arxiv.org/abs/2312.02663v2
- Date: Wed, 6 Dec 2023 12:23:36 GMT
- Title: FaceStudio: Put Your Face Everywhere in Seconds
- Authors: Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng,
Gang Yu, Bin Fu
- Abstract summary: Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
- Score: 23.381791316305332
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This study investigates identity-preserving image synthesis, an intriguing
task in image generation that seeks to maintain a subject's identity while
adding a personalized, stylistic touch. Traditional methods, such as Textual
Inversion and DreamBooth, have made strides in custom image creation, but they
come with significant drawbacks. These include the need for extensive resources
and time for fine-tuning, as well as the requirement for multiple reference
images. To overcome these challenges, our research introduces a novel approach
to identity-preserving synthesis, with a particular focus on human images. Our
model leverages a direct feed-forward mechanism, circumventing the need for
intensive fine-tuning, thereby facilitating quick and efficient image
generation. Central to our innovation is a hybrid guidance framework, which
combines stylized images, facial images, and textual prompts to guide the image
generation process. This unique combination enables our model to produce a
variety of applications, such as artistic portraits and identity-blended
images. Our experimental results, including both qualitative and quantitative
evaluations, demonstrate the superiority of our method over existing baseline
models and previous works, particularly in its remarkable efficiency and
ability to preserve the subject's identity with high fidelity.
Related papers
- Imagine yourself: Tuning-Free Personalized Image Generation [39.63411174712078]
We introduce Imagine yourself, a state-of-the-art model designed for personalized image generation.
It operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments.
Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment.
arXiv Detail & Related papers (2024-09-20T09:21:49Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [66.05234562835136]
We present MuDI, a novel framework that enables multi-subject personalization.
Our main idea is to utilize segmented subjects generated by a foundation model for segmentation.
Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
arXiv Detail & Related papers (2024-04-05T17:45:22Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt.
Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z) - PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains.
After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Few-shots Portrait Generation with Style Enhancement and Identity
Preservation [3.6937810031393123]
StyleIdentityGAN model can ensure the identity and artistry of the generated portrait at the same time.
Style-enhanced module focuses on artistic style features decoupling and transferring to improve the artistry of generated virtual face images.
Experiments demonstrate the superiority of StyleIdentityGAN over state-of-art methods in artistry and identity effects.
arXiv Detail & Related papers (2023-03-01T10:02:12Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Quality Guided Sketch-to-Photo Image Synthesis [12.617078020344618]
We propose a generative adversarial network that synthesizes a single sketch into multiple synthetic images with unique attributes like hair color, sex, etc.
Our approach is aimed at improving the visual appeal of the synthesised images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesised image.
arXiv Detail & Related papers (2020-04-20T16:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.