Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization
- URL: http://arxiv.org/abs/2401.16762v1
- Date: Tue, 30 Jan 2024 05:56:12 GMT
- Title: Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization
- Authors: Henglei Lv, Jiayu Xiao, Liang Li, Qingming Huang
- Abstract summary: Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
- Score: 56.12990759116612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based text-to-image personalization have achieved great success in
generating subjects specified by users among various contexts. Even though,
existing finetuning-based methods still suffer from model overfitting, which
greatly harms the generative diversity, especially when given subject images
are few. To this end, we propose Pick-and-Draw, a training-free semantic
guidance approach to boost identity consistency and generative diversity for
personalization methods. Our approach consists of two components: appearance
picking guidance and layout drawing guidance. As for the former, we construct
an appearance palette with visual features from the reference image, where we
pick local patterns for generating the specified subject with consistent
identity. As for layout drawing, we outline the subject's contour by referring
to a generative template from the vanilla diffusion model, and inherit the
strong image prior to synthesize diverse contexts according to different text
conditions. The proposed approach can be applied to any personalized diffusion
models and requires as few as a single reference image. Qualitative and
quantitative experiments show that Pick-and-Draw consistently improves identity
consistency and generative diversity, pushing the trade-off between subject
fidelity and image-text fidelity to a new Pareto frontier.
Related papers
- Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation [40.969861849933444]
We propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch.
In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images.
In the second stage, multi-source attention swaps the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image.
arXiv Detail & Related papers (2024-07-13T05:28:45Z) - Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation.
Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance [6.4680449907623006]
This research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects.
The proposed multi-subject cross-attention orchestrates inter-subject compositions while preserving the control of texts.
arXiv Detail & Related papers (2024-06-11T12:32:53Z) - Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models [85.14042557052352]
We introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time.
We show that Concept Weaver can generate multiple custom concepts with higher identity fidelity compared to alternative approaches.
arXiv Detail & Related papers (2024-04-05T06:41:27Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing
with Pre-Trained Diffusion Model [22.975965453227477]
We introduce a new framework called textitPaste, Inpaint and Harmonize via Denoising (PhD)
In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject.
arXiv Detail & Related papers (2023-06-13T07:43:10Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.