SODA: Bottleneck Diffusion Models for Representation Learning
- URL: http://arxiv.org/abs/2311.17901v1
- Date: Wed, 29 Nov 2023 18:53:34 GMT
- Title: SODA: Bottleneck Diffusion Models for Representation Learning
- Authors: Drew A. Hudson, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen,
Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander
Lerchner
- Abstract summary: We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
- Score: 75.7331354734152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SODA, a self-supervised diffusion model, designed for
representation learning. The model incorporates an image encoder, which
distills a source view into a compact representation, that, in turn, guides the
generation of related novel views. We show that by imposing a tight bottleneck
between the encoder and a denoising decoder, and leveraging novel view
synthesis as a self-supervised objective, we can turn diffusion models into
strong representation learners, capable of capturing visual semantics in an
unsupervised manner. To the best of our knowledge, SODA is the first diffusion
model to succeed at ImageNet linear-probe classification, and, at the same
time, it accomplishes reconstruction, editing and synthesis tasks across a wide
range of datasets. Further investigation reveals the disentangled nature of its
emergent latent space, that serves as an effective interface to control and
manipulate the model's produced images. All in all, we aim to shed light on the
exciting and promising potential of diffusion models, not only for image
generation, but also for learning rich and robust representations.
Related papers
- Denoising Autoregressive Representation Learning [13.185567468951628]
Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively.
We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models.
arXiv Detail & Related papers (2024-03-08T10:19:00Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models [47.986381326169166]
We introduce SlotDiffusion -- an object-centric Latent Diffusion Model (LDM) designed for both image and video data.
Thanks to the powerful modeling capacity of LDMs, SlotDiffusion surpasses previous slot models in unsupervised object segmentation and visual generation.
Our learned object features can be utilized by existing object-centric dynamics models, improving video prediction quality and downstream temporal reasoning tasks.
arXiv Detail & Related papers (2023-05-18T19:56:20Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - Object-Centric Slot Diffusion [30.722428924152382]
We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes.
We demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders.
We also conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD.
arXiv Detail & Related papers (2023-03-20T02:40:16Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - Decoupling Global and Local Representations via Invertible Generative
Flows [47.366299240738094]
Experimental results on standard image benchmarks demonstrate the effectiveness of our model in terms of density estimation, image generation and unsupervised representation learning.
This work demonstrates that a generative model with a likelihood-based objective is capable of learning decoupled representations, requiring no explicit supervision.
arXiv Detail & Related papers (2020-04-12T03:18:13Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.