f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation
- URL: http://arxiv.org/abs/2210.04955v1
- Date: Mon, 10 Oct 2022 18:49:25 GMT
- Title: f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation
- Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh
Susskind
- Abstract summary: Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
- Score: 56.04628143914542
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models (DMs) have recently emerged as SoTA tools for generative
modeling in various domains. Standard DMs can be viewed as an instantiation of
hierarchical variational autoencoders (VAEs) where the latent variables are
inferred from input-centered Gaussian distributions with fixed scales and
variances. Unlike VAEs, this formulation limits DMs from changing the latent
spaces and learning abstract representations. In this work, we propose f-DM, a
generalized family of DMs which allows progressive signal transformation. More
precisely, we extend DMs to incorporate a set of (hand-designed or learned)
transformations, where the transformed input is the mean of each diffusion
step. We propose a generalized formulation and derive the corresponding
de-noising objective with a modified sampling algorithm. As a demonstration, we
apply f-DM in image generation tasks with a range of functions, including
down-sampling, blurring, and learned transformations based on the encoder of
pretrained VAEs. In addition, we identify the importance of adjusting the noise
levels whenever the signal is sub-sampled and propose a simple rescaling
recipe. f-DM can produce high-quality samples on standard image generation
benchmarks like FFHQ, AFHQ, LSUN, and ImageNet with better efficiency and
semantic interpretation.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations [41.87051958934507]
This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using rectified flow models (such as Flux)
Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing.
arXiv Detail & Related papers (2024-10-14T17:56:24Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax [73.03684002513218]
We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution.
We show that such modification allows for learning uniformly and normally distributed representations.
The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
arXiv Detail & Related papers (2024-10-09T15:40:04Z) - Self-Consistent Recursive Diffusion Bridge for Medical Image Translation [6.850683267295248]
Denoising diffusion models (DDMs) have gained recent traction in medical image translation given improved training stability over adversarial models.
We propose a novel self-consistent iterative diffusion bridge (SelfRDB) for improved performance in medical image translation.
Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.
arXiv Detail & Related papers (2024-05-10T19:39:55Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Representation Learning with Diffusion Models [0.0]
Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation.
We introduce a framework for learning such representations with diffusion models (LRDM)
In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process.
arXiv Detail & Related papers (2022-10-20T07:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.