DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via
Positive-Negative Prompt-Tuning
- URL: http://arxiv.org/abs/2211.11337v3
- Date: Wed, 5 Apr 2023 13:38:28 GMT
- Title: DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via
Positive-Negative Prompt-Tuning
- Authors: Ziyi Dong, Pengxu Wei, Liang Lin
- Abstract summary: Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts.
Recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set.
We present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy.
- Score: 85.10894272034135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale text-to-image generation models have achieved remarkable progress
in synthesizing high-quality, feature-rich images with high resolution guided
by texts. However, these models often struggle with novel concepts, eg, new
styles, object entities, etc. Although recent attempts have employed
fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion
model novel concepts from a reference image set,they have the drawback of
overfitting to the given reference images, particularly in one-shot
applications, which is harmful to generate diverse and high-quality images
while maintaining generation controllability.
To tackle this challenge, we present a simple yet effective method called
DreamArtist, which employs a positive-negative prompt-tuning learning strategy.
Specifically, DreamArtist incorporates both positive and negative embeddings
and jointly trains them. The positive embedding aggressively captures the
salient characteristics of the reference image to drive diversified generation
and the negative embedding rectifies inadequacies from the positive embedding.
It learns not only what is correct, but also what can be avoided or improved.
We have conducted extensive experiments and evaluated the proposed method from
image similarity and diversity, generation controllability, and style cloning.
And our DreamArtist has achieved a superior generation performance over
existing methods. Besides, our additional evaluation on extended tasks,
including concept compositions and prompt-guided image editing, demonstrates
its effectiveness for more applications.
Related papers
- Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation [0.0]
We introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation.
We demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities.
arXiv Detail & Related papers (2024-05-29T05:04:07Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing
Else [75.6806649860538]
We consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model.
We observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance.
We design a minimal low-cost solution that overcomes the above issues by tweaking the text embeddings for more realistic multi-concept text-to-image generation.
arXiv Detail & Related papers (2023-10-11T12:05:44Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.