Denoising Diffusion Autoencoders are Unified Self-supervised Learners
- URL: http://arxiv.org/abs/2303.09769v2
- Date: Sat, 19 Aug 2023 11:12:29 GMT
- Title: Denoising Diffusion Autoencoders are Unified Self-supervised Learners
- Authors: Weilai Xiang, Hongyu Yang, Di Huang, Yunhong Wang
- Abstract summary: This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
- Score: 58.194184241363175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by recent advances in diffusion models, which are reminiscent of
denoising autoencoders, we investigate whether they can acquire discriminative
representations for classification via generative pre-training. This paper
shows that the networks in diffusion models, namely denoising diffusion
autoencoders (DDAE), are unified self-supervised learners: by pre-training on
unconditional image generation, DDAE has already learned strongly
linear-separable representations within its intermediate layers without
auxiliary encoders, thus making diffusion pre-training emerge as a general
approach for generative-and-discriminative dual learning. To validate this, we
conduct linear probe and fine-tuning evaluations. Our diffusion-based approach
achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and
Tiny-ImageNet, respectively, and is comparable to contrastive learning and
masked autoencoders for the first time. Transfer learning from ImageNet also
confirms the suitability of DDAE for Vision Transformers, suggesting the
potential to scale DDAEs as unified foundation models. Code is available at
github.com/FutureXiang/ddae.
Related papers
- Unified Auto-Encoding with Masked Diffusion [15.264296748357157]
We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD)
UMD combines patch-based and noise-based corruption techniques within a single auto-encoding framework.
It achieves strong performance in downstream generative and representation learning tasks.
arXiv Detail & Related papers (2024-06-25T16:24:34Z) - Denoising Autoregressive Representation Learning [13.185567468951628]
Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively.
We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models.
arXiv Detail & Related papers (2024-03-08T10:19:00Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation [48.25619775814776]
This paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation.
DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding.
Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
arXiv Detail & Related papers (2023-09-10T13:28:46Z) - Recovering high-quality FODs from a reduced number of diffusion-weighted
images using a model-driven deep learning architecture [0.0]
We propose a model-driven deep learning FOD reconstruction architecture.
It ensures intermediate and output FODs produced by the network are consistent with the input DWI signals.
Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net.
arXiv Detail & Related papers (2023-07-28T02:47:34Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.