Representation Learning with Diffusion Models
- URL: http://arxiv.org/abs/2210.11058v1
- Date: Thu, 20 Oct 2022 07:26:47 GMT
- Title: Representation Learning with Diffusion Models
- Authors: Jeremias Traub
- Abstract summary: Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation.
We introduce a framework for learning such representations with diffusion models (LRDM)
In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models (DMs) have achieved state-of-the-art results for image
synthesis tasks as well as density estimation. Applied in the latent space of a
powerful pretrained autoencoder (LDM), their immense computational requirements
can be significantly reduced without sacrificing sampling quality. However, DMs
and LDMs lack a semantically meaningful representation space as the diffusion
process gradually destroys information in the latent variables. We introduce a
framework for learning such representations with diffusion models (LRDM). To
that end, a LDM is conditioned on the representation extracted from the clean
image by a separate encoder. In particular, the DM and the representation
encoder are trained jointly in order to learn rich representations specific to
the generative denoising process. By introducing a tractable representation
prior, we can efficiently sample from the representation distribution for
unconditional image synthesis without training of any additional model. We
demonstrate that i) competitive image generation results can be achieved with
image-parameterized LDMs, ii) LRDMs are capable of learning semantically
meaningful representations, allowing for faithful image reconstructions and
semantic interpolations. Our implementation is available at
https://github.com/jeremiastraub/diffusion.
Related papers
- MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis [8.080248399002663]
In this paper, semantic image synthesis is treated as an image denoising task.
The style reference is first contaminated with random noise and then progressively denoised by IIDM.
Three techniques, refinement, color-transfer and model ensembles are proposed to further boost the generation quality.
arXiv Detail & Related papers (2024-03-20T08:21:00Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - SinDDM: A Single Image Denoising Diffusion Model [28.51951207066209]
We introduce a framework for training a Denoising diffusion model on a single image.
Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process.
It is applicable in a wide array of tasks, including style transfer and harmonization.
arXiv Detail & Related papers (2022-11-29T20:44:25Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation [56.04628143914542]
Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
arXiv Detail & Related papers (2022-10-10T18:49:25Z) - High-Resolution Image Synthesis with Latent Diffusion Models [14.786952412297808]
Training diffusion models on autoencoders allows for the first time to reach a near-optimal point between complexity reduction and detail preservation.
Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks.
arXiv Detail & Related papers (2021-12-20T18:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.