Diffusion Models as Masked Autoencoders
- URL: http://arxiv.org/abs/2304.03283v1
- Date: Thu, 6 Apr 2023 17:59:56 GMT
- Title: Diffusion Models as Masked Autoencoders
- Authors: Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu
Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
- Abstract summary: We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
- Score: 52.442717717898056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a longstanding belief that generation can facilitate a true
understanding of visual data. In line with this, we revisit generatively
pre-training visual representations in light of recent interest in denoising
diffusion models. While directly pre-training with diffusion models does not
produce strong representations, we condition diffusion models on masked input
and formulate diffusion models as masked autoencoders (DiffMAE). Our approach
is capable of (i) serving as a strong initialization for downstream recognition
tasks, (ii) conducting high-quality image inpainting, and (iii) being
effortlessly extended to video where it produces state-of-the-art
classification accuracy. We further perform a comprehensive study on the pros
and cons of design choices and build connections between diffusion models and
masked autoencoders.
Related papers
- Denoising Autoregressive Representation Learning [13.185567468951628]
Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively.
We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models.
arXiv Detail & Related papers (2024-03-08T10:19:00Z) - Neural Network Parameter Diffusion [50.85251415173792]
Diffusion models have achieved remarkable success in image and video generation.
In this work, we demonstrate that diffusion models can also.
generate high-performing neural network parameters.
arXiv Detail & Related papers (2024-02-20T16:59:03Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - InfoDiffusion: Representation Learning Using Information Maximizing
Diffusion Models [35.566528358691336]
InfoDiffusion is an algorithm that augments diffusion models with low-dimensional latent variables.
InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables.
We find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods.
arXiv Detail & Related papers (2023-06-14T21:48:38Z) - Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.