Training-free Editioning of Text-to-Image Models
- URL: http://arxiv.org/abs/2405.17069v1
- Date: Mon, 27 May 2024 11:40:50 GMT
- Title: Training-free Editioning of Text-to-Image Models
- Authors: Jinqi Wang, Yunfei Fu, Zhangcan Ding, Bailin Deng, Yu-Kun Lai, Yipeng Qin,
- Abstract summary: We propose a novel task, namely, training-free editioning, for text-to-image models.
We aim to create variations of a base text-to-image model without retraining.
Our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition"
- Score: 47.32550822603952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups or to offer distinct features and functionalities. To achieve this, we propose that different editions of a given text-to-image model can be formulated as concept subspaces in the latent space of its text encoder (e.g., CLIP). In such a concept subspace, all points satisfy a specific user need (e.g., generating images of a cat lying on the grass/ground/falling leaves). Technically, we apply Principal Component Analysis (PCA) to obtain the desired concept subspaces from representative text embedding that correspond to a specific user need or requirement. Projecting the text embedding of a given prompt into these low-dimensional subspaces enables efficient model editioning without retraining. Intuitively, our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition" (or other editions) that restricts image generation to cats, regardless of the user's prompt (e.g., dogs, people, etc.). This introduces a new dimension for product differentiation, targeted functionality, and pricing strategies, unlocking novel business models for text-to-image generators. Extensive experimental results demonstrate the validity of our approach and its potential to enable a wide range of customized text-to-image model editions across various domains and applications.
Related papers
- JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition [47.07564907486087]
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts.
This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models.
arXiv Detail & Related papers (2024-02-23T18:55:09Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - Customization Assistant for Text-to-image Generation [40.76198867803018]
We propose a new framework consists of a new model design and a novel training strategy.
The resulting assistant can perform customized generation in 2-5 seconds without any test time fine-tuning.
arXiv Detail & Related papers (2023-12-05T16:54:42Z) - Inserting Anybody in Diffusion Models via Celeb Basis [29.51292196851589]
We propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model.
To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder.
Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods.
arXiv Detail & Related papers (2023-06-01T17:30:24Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - Enhancing Detail Preservation for Customized Text-to-Image Generation: A
Regularization-Free Approach [43.53330622723175]
We propose a novel framework for customized text-to-image generation without the use of regularization.
With the proposed framework, we are able to customize a large-scale text-to-image generation model within half a minute on single GPU.
arXiv Detail & Related papers (2023-05-23T01:14:53Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.