Controllable and Compositional Generation with Latent-Space Energy-Based
Models
- URL: http://arxiv.org/abs/2110.10873v1
- Date: Thu, 21 Oct 2021 03:31:45 GMT
- Title: Controllable and Compositional Generation with Latent-Space Energy-Based
Models
- Authors: Weili Nie, Arash Vahdat, Anima Anandkumar
- Abstract summary: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications.
In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes.
By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
- Score: 60.87740144816278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable generation is one of the key requirements for successful
adoption of deep generative models in real-world applications, but it still
remains as a great challenge. In particular, the compositional ability to
generate novel concept combinations is out of reach for most current models. In
this work, we use energy-based models (EBMs) to handle compositional generation
over a set of attributes. To make them scalable to high-resolution image
generation, we introduce an EBM in the latent space of a pre-trained generative
model such as StyleGAN. We propose a novel EBM formulation representing the
joint distribution of data and attributes together, and we show how sampling
from it is formulated as solving an ordinary differential equation (ODE). Given
a pre-trained generator, all we need for controllable generation is to train an
attribute classifier. Sampling with ODEs is done efficiently in the latent
space and is robust to hyperparameters. Thus, our method is simple, fast to
train, and efficient to sample. Experimental results show that our method
outperforms the state-of-the-art in both conditional sampling and sequential
editing. In compositional generation, our method excels at zero-shot generation
of unseen attribute combinations. Also, by composing energy functions with
logical operators, this work is the first to achieve such compositionality in
generating photo-realistic images of resolution 1024x1024.
Related papers
- Flow Generator Matching [35.371071097381346]
Flow Generator Matching (FGM) is designed to accelerate the sampling of flow-matching models into a one-step generation.
On the CIFAR10 unconditional generation benchmark, our one-step FGM model achieves a new record Fr'echet Inception Distance (FID) score of 3.08.
MM-DiT-FGM one-step text-to-image model demonstrates outstanding industry-level performance.
arXiv Detail & Related papers (2024-10-25T05:41:28Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - Energy-Based Models for Code Generation under Compilability Constraints [2.9176992922046923]
In this work, we pose the problem of learning to generate compilable code as constraint satisfaction.
We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences.
We then use the KL-Adaptive Distributional Policy Gradient algorithm to train a generative model approxing the EBM.
arXiv Detail & Related papers (2021-06-09T11:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.