Learning to Customize Text-to-Image Diffusion In Diverse Context
- URL: http://arxiv.org/abs/2410.10058v1
- Date: Mon, 14 Oct 2024 00:53:59 GMT
- Title: Learning to Customize Text-to-Image Diffusion In Diverse Context
- Authors: Taewook Kim, Wei Chen, Qiang Qiu,
- Abstract summary: Most text-to-image customization techniques fine-tune models on a small set of emphpersonal concept images captured in minimal contexts.
We resort to diversifying the context of these personal concepts by simply creating a contextually rich set of text prompts.
Surprisingly, this straightforward and cost-effective method significantly improves semantic alignment in the textual space.
Our approach does not require any architectural modifications, making it highly compatible with existing text-to-image customization methods.
- Score: 23.239646132590043
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to generalize to new contexts in future text prompts. Existing customization methods are built on the success of effectively representing personal concepts as textual embeddings. Thus, in this work, we resort to diversifying the context of these personal concepts \emph{solely} within the textual space by simply creating a contextually rich set of text prompts, together with a widely used self-supervised learning objective. Surprisingly, this straightforward and cost-effective method significantly improves semantic alignment in the textual space, and this effect further extends to the image space, resulting in higher prompt fidelity for generated images. Additionally, our approach does not require any architectural modifications, making it highly compatible with existing text-to-image customization methods. We demonstrate the broad applicability of our approach by combining it with four different baseline methods, achieving notable CLIP score improvements.
Related papers
- Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace [52.24866347353916]
We propose an efficient method to explore the target embedding in a textual subspace.
We also propose an efficient selection strategy for determining the basis of the textual subspace.
Our method opens the door to more efficient representation learning for personalized text-to-image generation.
arXiv Detail & Related papers (2024-06-30T06:41:21Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - PALP: Prompt Aligned Personalization of Text-to-Image Models [68.91005384187348]
Existing personalization methods compromise personalization ability or the alignment to complex prompts.
We propose a new approach focusing on personalization methods for a emphsingle prompt to address this issue.
Our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts.
arXiv Detail & Related papers (2024-01-11T18:35:33Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - InstructBooth: Instruction-following Personalized Text-to-Image
Generation [30.89054609185801]
InstructBooth is a novel method designed to enhance image-text alignment in personalized text-to-image models.
Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier.
After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment.
arXiv Detail & Related papers (2023-12-04T20:34:46Z) - LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis [24.925757148750684]
We propose a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions.
LoCo seamlessly integrates into existing text-to-image and layout-to-image models, enhancing their performance in spatial control and addressing semantic failures observed in prior methods.
arXiv Detail & Related papers (2023-11-21T04:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.