InfoDiffusion: Representation Learning Using Information Maximizing
Diffusion Models
- URL: http://arxiv.org/abs/2306.08757v1
- Date: Wed, 14 Jun 2023 21:48:38 GMT
- Title: InfoDiffusion: Representation Learning Using Information Maximizing
Diffusion Models
- Authors: Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang,
Christopher De Sa, Volodymyr Kuleshov
- Abstract summary: InfoDiffusion is an algorithm that augments diffusion models with low-dimensional latent variables.
InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables.
We find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods.
- Score: 35.566528358691336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While diffusion models excel at generating high-quality samples, their latent
variables typically lack semantic meaning and are not suitable for
representation learning. Here, we propose InfoDiffusion, an algorithm that
augments diffusion models with low-dimensional latent variables that capture
high-level factors of variation in the data. InfoDiffusion relies on a learning
objective regularized with the mutual information between observed and hidden
variables, which improves latent space quality and prevents the latents from
being ignored by expressive diffusion-based decoders. Empirically, we find that
InfoDiffusion learns disentangled and human-interpretable latent
representations that are competitive with state-of-the-art generative and
contrastive methods, while retaining the high sample quality of diffusion
models. Our method enables manipulating the attributes of generated images and
has the potential to assist tasks that require exploring a learned latent space
to generate quality samples, e.g., generative design.
Related papers
- Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory [27.651921957220004]
We introduce a novel data-free federated class incremental learning framework with diffusion-based generative memory (DFedDGM)
We design a new balanced sampler to help train the diffusion models to alleviate the common non-IID problem in FL.
We also introduce an entropy-based sample filtering technique from an information theory perspective to enhance the quality of generative samples.
arXiv Detail & Related papers (2024-05-22T20:59:18Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - DiffAugment: Diffusion based Long-Tailed Visual Relationship Recognition [43.01467525231004]
We introduce DiffAugment -- a method which augments the tail classes in the linguistic space by making use of WordNet.
We demonstrate the effectiveness of hardness-aware diffusion in generating visual embeddings for the tail classes.
We also propose a novel subject and object based seeding strategy for diffusion sampling which improves the discriminative capability of the generated visual embeddings.
arXiv Detail & Related papers (2024-01-01T21:20:43Z) - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation [48.25619775814776]
This paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation.
DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding.
Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
arXiv Detail & Related papers (2023-09-10T13:28:46Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.