Inserting Anybody in Diffusion Models via Celeb Basis
- URL: http://arxiv.org/abs/2306.00926v1
- Date: Thu, 1 Jun 2023 17:30:24 GMT
- Title: Inserting Anybody in Diffusion Models via Celeb Basis
- Authors: Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao
Wang, Ying Shan, Huicheng Zheng
- Abstract summary: We propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model.
To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder.
Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods.
- Score: 29.51292196851589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exquisite demand exists for customizing the pretrained large text-to-image
model, $\textit{e.g.}$, Stable Diffusion, to generate innovative concepts, such
as the users themselves. However, the newly-added concept from previous
customization methods often shows weaker combination abilities than the
original ones even given several images during training. We thus propose a new
personalization method that allows for the seamless integration of a unique
individual into the pre-trained diffusion model using just $\textbf{one facial
photograph}$ and only $\textbf{1024 learnable parameters}$ under $\textbf{3
minutes}$. So as we can effortlessly generate stunning images of this person in
any pose or position, interacting with anyone and doing anything imaginable
from text prompts. To achieve this, we first analyze and build a well-defined
celeb basis from the embedding space of the pre-trained large text encoder.
Then, given one facial photo as the target identity, we generate its own
embedding by optimizing the weight of this basis and locking all other
parameters. Empowered by the proposed celeb basis, the new identity in our
customized model showcases a better concept combination ability than previous
personalization methods. Besides, our model can also learn several new
identities at once and interact with each other where the previous
customization model fails to. The code will be released.
Related papers
- JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Training-free Editioning of Text-to-Image Models [47.32550822603952]
We propose a novel task, namely, training-free editioning, for text-to-image models.
We aim to create variations of a base text-to-image model without retraining.
Our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition"
arXiv Detail & Related papers (2024-05-27T11:40:50Z) - Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition [47.07564907486087]
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts.
This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models.
arXiv Detail & Related papers (2024-02-23T18:55:09Z) - CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image
Personalization [56.892032386104006]
CatVersion is an inversion-based method that learns the personalized concept through a handful of examples.
Users can utilize text prompts to generate images that embody the personalized concept.
arXiv Detail & Related papers (2023-11-24T17:55:10Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA [64.10981296843609]
We show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially.
We propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model.
We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification.
arXiv Detail & Related papers (2023-04-12T17:59:41Z) - InstantBooth: Personalized Text-to-Image Generation without Test-Time
Finetuning [20.127745565621616]
We propose InstantBooth, a novel approach built upon pre-trained text-to-image models.
Our model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation.
arXiv Detail & Related papers (2023-04-06T23:26:38Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - Multi-Concept Customization of Text-to-Image Diffusion [51.8642043743222]
We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models.
We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts.
Our model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings.
arXiv Detail & Related papers (2022-12-08T18:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.