Exploring Compositional Visual Generation with Latent Classifier
Guidance
- URL: http://arxiv.org/abs/2304.12536v2
- Date: Wed, 24 May 2023 06:17:11 GMT
- Title: Exploring Compositional Visual Generation with Latent Classifier
Guidance
- Authors: Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin
Renqiang Min
- Abstract summary: We train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation.
We show that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training.
We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images.
- Score: 19.48538300223431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion probabilistic models have achieved enormous success in the field of
image generation and manipulation. In this paper, we explore a novel paradigm
of using the diffusion model and classifier guidance in the latent semantic
space for compositional visual tasks. Specifically, we train latent diffusion
models and auxiliary latent classifiers to facilitate non-linear navigation of
latent representation generation for any pre-trained generative model with a
semantic latent space. We demonstrate that such conditional generation achieved
by latent classifier guidance provably maximizes a lower bound of the
conditional log probability during training. To maintain the original semantics
during manipulation, we introduce a new guidance term, which we show is crucial
for achieving compositionality. With additional assumptions, we show that the
non-linear manipulation reduces to a simple latent arithmetic approach. We show
that this paradigm based on latent classifier guidance is agnostic to
pre-trained generative models, and present competitive results for both image
generation and sequential manipulation of real and synthetic images. Our
findings suggest that latent classifier guidance is a promising approach that
merits further exploration, even in the presence of other strong competing
methods.
Related papers
- A Bayesian Approach to Weakly-supervised Laparoscopic Image Segmentation [1.9639956888747314]
We study weakly-supervised laparoscopic image segmentation with sparse annotations.
We introduce a novel Bayesian deep learning approach designed to enhance both the accuracy and interpretability of the model's segmentation.
arXiv Detail & Related papers (2024-10-11T04:19:48Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Manifold Contrastive Learning with Variational Lie Group Operators [5.0741409008225755]
We propose a contrastive learning approach that directly models the latent manifold using Lie group operators parameterized by coefficients with a sparsity-promoting prior.
A variational distribution over these coefficients provides a generative model of the manifold, with samples which provide feature augmentations applicable both during contrastive training and downstream tasks.
arXiv Detail & Related papers (2023-06-23T15:07:01Z) - Latent Traversals in Generative Models as Potential Flows [113.4232528843775]
We propose to model latent structures with a learned dynamic potential landscape.
Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations.
Our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T15:53:45Z) - Learning Data Representations with Joint Diffusion Models [20.25147743706431]
Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train.
We extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives.
The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks.
arXiv Detail & Related papers (2023-01-31T13:29:19Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.