Controllable Generation with Text-to-Image Diffusion Models: A Survey
- URL: http://arxiv.org/abs/2403.04279v1
- Date: Thu, 7 Mar 2024 07:24:18 GMT
- Title: Controllable Generation with Text-to-Image Diffusion Models: A Survey
- Authors: Pu Cao, Feng Zhou, Qing Song, Lu Yang
- Abstract summary: controllable generation studies aim to control pre-trained text-to-image (T2I) models to support novel conditions.
Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models.
We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process.
- Score: 8.394970202694529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly advancing realm of visual generation, diffusion models have
revolutionized the landscape, marking a significant shift in capabilities with
their impressive text-guided generative functions. However, relying solely on
text for conditioning these models does not fully cater to the varied and
complex requirements of different applications and scenarios. Acknowledging
this shortfall, a variety of studies aim to control pre-trained text-to-image
(T2I) models to support novel conditions. In this survey, we undertake a
thorough review of the literature on controllable generation with T2I diffusion
models, covering both the theoretical foundations and practical advancements in
this domain. Our review begins with a brief introduction to the basics of
denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion
models. We then reveal the controlling mechanisms of diffusion models,
theoretically analyzing how novel conditions are introduced into the denoising
process for conditional generation. Additionally, we offer a detailed overview
of research in this area, organizing it into distinct categories from the
condition perspective: generation with specific conditions, generation with
multiple conditions, and universal controllable generation. For an exhaustive
list of the controllable generation literature surveyed, please refer to our
curated repository at
\url{https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models}.
Related papers
- Minority-Focused Text-to-Image Generation via Prompt Optimization [57.319845580050924]
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models.
We develop an online prompt optimization framework that can encourage the emergence of desired properties.
We then tailor this generic prompt into a specialized solver that promotes the generation of minority features.
arXiv Detail & Related papers (2024-10-10T11:56:09Z) - CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling.
CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process.
Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z) - Table-to-Text Generation with Pretrained Diffusion Models [0.0]
Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks.
We investigate their application to the table-to-text problem by adapting the diffusion model to the task and conducting an in-depth analysis.
Our findings reveal that diffusion models achieve comparable results in the table-to-text domain.
arXiv Detail & Related papers (2024-09-10T15:36:53Z) - Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
diffusion model-based solutions have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity.
We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models.
We summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios.
arXiv Detail & Related papers (2024-06-17T01:49:27Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Diffusion Models in NLP: A Survey [1.5138755188783584]
Diffusion models have become a powerful family of deep generative models, with record-breaking performance in many applications.
This paper first gives an overview and derivation of the basic theory of diffusion models, then reviews the research results of diffusion models in the field of natural language processing.
arXiv Detail & Related papers (2023-03-14T01:53:49Z) - Diffusion Models for Non-autoregressive Text Generation: A Survey [94.4634088113513]
Non-autoregressive (NAR) text generation has attracted much attention in the field of natural language processing.
Recently, diffusion models have been introduced into NAR text generation, showing an improved text generation quality.
arXiv Detail & Related papers (2023-03-12T05:11:09Z) - Self-conditioned Embedding Diffusion for Text Generation [28.342735885752493]
Self-conditioned Embedding Diffusion is a continuous diffusion mechanism that operates on token embeddings.
We show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models.
arXiv Detail & Related papers (2022-11-08T13:30:27Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.