DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models
- URL: http://arxiv.org/abs/2312.14216v1
- Date: Thu, 21 Dec 2023 12:11:00 GMT
- Title: DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models
- Authors: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang,
Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
- Abstract summary: We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts.
These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions.
We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
- Score: 53.17454737232668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popularization of Text-to-Image (T2I) diffusion models enables the
generation of high-quality images from text descriptions. However, generating
diverse customized images with reference visual attributes remains challenging.
This work focuses on personalizing T2I diffusion models at a more abstract
concept or category level, adapting commonalities from a set of reference
images while creating new instances with sufficient variations. We introduce a
solution that allows a pretrained T2I diffusion model to learn a set of soft
prompts, enabling the generation of novel images by sampling prompts from the
learned distribution. These prompts offer text-guided editing capabilities and
additional flexibility in controlling variation and mixing between multiple
distributions. We also show the adaptability of the learned prompt distribution
to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our
approach through quantitative analysis including automatic evaluation and human
assessment. Project website: https://briannlongzhao.github.io/DreamDistribution
Related papers
- Reason out Your Layout: Evoking the Layout Master from Large Language
Models for Text-to-Image Synthesis [47.27044390204868]
We introduce a novel approach to improving T2I diffusion models using Large Language Models (LLMs) as layout generators.
Our experiments demonstrate significant improvements in image quality and layout accuracy.
arXiv Detail & Related papers (2023-11-28T14:51:13Z) - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models.
Our method leverages a pretrained large language model for grounded generation in a novel two-stage process.
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - In-Context Learning Unlocked for Diffusion Models [163.54453915874402]
We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.
We propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input.
The resulting Prompt Diffusion model is the first diffusion-based vision-language foundation model capable of in-context learning.
arXiv Detail & Related papers (2023-05-01T23:03:37Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.