Multi-Concept Customization of Text-to-Image Diffusion
- URL: http://arxiv.org/abs/2212.04488v2
- Date: Tue, 20 Jun 2023 16:26:38 GMT
- Title: Multi-Concept Customization of Text-to-Image Diffusion
- Authors: Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan
Zhu
- Abstract summary: We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models.
We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts.
Our model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings.
- Score: 51.8642043743222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While generative models produce high-quality images of concepts learned from
a large-scale database, a user often wishes to synthesize instantiations of
their own concepts (for example, their family, pets, or items). Can we teach a
model to quickly acquire a new concept, given a few examples? Furthermore, can
we compose multiple new concepts together? We propose Custom Diffusion, an
efficient method for augmenting existing text-to-image models. We find that
only optimizing a few parameters in the text-to-image conditioning mechanism is
sufficiently powerful to represent new concepts while enabling fast tuning (~6
minutes). Additionally, we can jointly train for multiple concepts or combine
multiple fine-tuned models into one via closed-form constrained optimization.
Our fine-tuned model generates variations of multiple new concepts and
seamlessly composes them with existing concepts in novel settings. Our method
outperforms or performs on par with several baselines and concurrent works in
both qualitative and quantitative evaluations while being memory and
computationally efficient.
Related papers
- FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts.
We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy.
Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z) - MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity.
MC$2$ decouples the requirements for model architecture via inference time optimization.
It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Orthogonal Adaptation for Modular Customization of Diffusion Models [39.62438974450659]
We address a new problem called Modular Customization, with the goal of efficiently merging customized models.
We introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning.
Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture.
arXiv Detail & Related papers (2023-12-05T02:17:48Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.