UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
- URL: http://arxiv.org/abs/2311.09257v5
- Date: Thu, 7 Dec 2023 03:56:56 GMT
- Title: UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
- Authors: Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou
- Abstract summary: We present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis.
Unlike conventional approaches, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective.
UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step.
- Score: 16.121569507866848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models have demonstrated remarkable capabilities in
transforming textual prompts into coherent images, yet the computational cost
of their inference remains a persistent challenge. To address this issue, we
present UFOGen, a novel generative model designed for ultra-fast, one-step
text-to-image synthesis. In contrast to conventional approaches that focus on
improving samplers or employing distillation techniques for diffusion models,
UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN
objective. Leveraging a newly introduced diffusion-GAN objective and
initialization with pre-trained diffusion models, UFOGen excels in efficiently
generating high-quality images conditioned on textual descriptions in a single
step. Beyond traditional text-to-image generation, UFOGen showcases versatility
in applications. Notably, UFOGen stands among the pioneering models enabling
one-step text-to-image generation and diverse downstream tasks, presenting a
significant advancement in the landscape of efficient generative models.
Related papers
- Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images.
This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs)
We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z) - GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models [4.710921988115686]
We introduce GANTASTIC, a novel framework that takes existing directions from pre-trained GAN models and transfers these directions into diffusion-based models.
This approach not only maintains the generative quality and diversity that diffusion models are known for but also significantly enhances their capability to perform precise, targeted image edits.
arXiv Detail & Related papers (2024-03-28T17:55:16Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices [13.923293508790122]
We propose textbfMobileDiffusion, a highly efficient text-to-image diffusion model.
We employ distillation and diffusion-GAN finetuning techniques on MobileDiffusion to achieve 8-step and 1-step inference respectively.
MobileDiffusion achieves a remarkable textbfsub-second inference speed for generating a $512times512$ image on mobile devices.
arXiv Detail & Related papers (2023-11-28T07:14:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using
Stable Diffusion Models [2.0935496890864207]
"DiffuGen" is a simple and adaptable approach that harnesses the power of stable diffusion models to create labeled image datasets efficiently.
By leveraging stable diffusion models, our approach not only ensures the quality of generated datasets but also provides a versatile solution for label generation.
arXiv Detail & Related papers (2023-09-01T04:42:03Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.