Controllable and Compositional Generation with Latent-Space Energy-Based
Models
- URL: http://arxiv.org/abs/2110.10873v1
- Date: Thu, 21 Oct 2021 03:31:45 GMT
- Title: Controllable and Compositional Generation with Latent-Space Energy-Based
Models
- Authors: Weili Nie, Arash Vahdat, Anima Anandkumar
- Abstract summary: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications.
In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes.
By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
- Score: 60.87740144816278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable generation is one of the key requirements for successful
adoption of deep generative models in real-world applications, but it still
remains as a great challenge. In particular, the compositional ability to
generate novel concept combinations is out of reach for most current models. In
this work, we use energy-based models (EBMs) to handle compositional generation
over a set of attributes. To make them scalable to high-resolution image
generation, we introduce an EBM in the latent space of a pre-trained generative
model such as StyleGAN. We propose a novel EBM formulation representing the
joint distribution of data and attributes together, and we show how sampling
from it is formulated as solving an ordinary differential equation (ODE). Given
a pre-trained generator, all we need for controllable generation is to train an
attribute classifier. Sampling with ODEs is done efficiently in the latent
space and is robust to hyperparameters. Thus, our method is simple, fast to
train, and efficient to sample. Experimental results show that our method
outperforms the state-of-the-art in both conditional sampling and sequential
editing. In compositional generation, our method excels at zero-shot generation
of unseen attribute combinations. Also, by composing energy functions with
logical operators, this work is the first to achieve such compositionality in
generating photo-realistic images of resolution 1024x1024.
Related papers
- MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Effective Dynamics of Generative Adversarial Networks [16.51305515824504]
Generative adversarial networks (GANs) are a class of machine-learning models that use adversarial training to generate new samples.
One major form of training failure, known as mode collapse, involves the generator failing to reproduce the full diversity of modes in the target probability distribution.
We present an effective model of GAN training, which captures the learning dynamics by replacing the generator neural network with a collection of particles in the output space.
arXiv Detail & Related papers (2022-12-08T22:04:01Z) - Generative Visual Prompt: Unifying Distributional Control of Pre-Trained
Generative Models [77.47505141269035]
Generative Visual Prompt (PromptGen) is a framework for distributional control over pre-trained generative models.
PromptGen approximats an energy-based model (EBM) and samples images in a feed-forward manner.
Code is available at https://github.com/ChenWu98/Generative-Visual-Prompt.
arXiv Detail & Related papers (2022-09-14T22:55:18Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - Energy-Based Models for Code Generation under Compilability Constraints [2.9176992922046923]
In this work, we pose the problem of learning to generate compilable code as constraint satisfaction.
We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences.
We then use the KL-Adaptive Distributional Policy Gradient algorithm to train a generative model approxing the EBM.
arXiv Detail & Related papers (2021-06-09T11:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.