LMD: Faster Image Reconstruction with Latent Masking Diffusion
- URL: http://arxiv.org/abs/2312.07971v1
- Date: Wed, 13 Dec 2023 08:36:51 GMT
- Title: LMD: Faster Image Reconstruction with Latent Masking Diffusion
- Authors: Zhiyuan Ma, zhihuan yu, Jianjun Li, Bowen Zhou
- Abstract summary: Masked autoencoders (MAEs), as popular self-supervised vision learners, have demonstrated simpler and more effective image reconstruction and transfer capabilities on downstream tasks.
This paper presents LMD, a faster image reconstruction framework with latent masking diffusion.
- Score: 28.54828478259779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a class of fruitful approaches, diffusion probabilistic models (DPMs) have
shown excellent advantages in high-resolution image reconstruction. On the
other hand, masked autoencoders (MAEs), as popular self-supervised vision
learners, have demonstrated simpler and more effective image reconstruction and
transfer capabilities on downstream tasks. However, they all require extremely
high training costs, either due to inherent high temporal-dependence (i.e.,
excessively long diffusion steps) or due to artificially low spatial-dependence
(i.e., human-formulated high mask ratio, such as 0.75). To the end, this paper
presents LMD, a faster image reconstruction framework with latent masking
diffusion. First, we propose to project and reconstruct images in latent space
through a pre-trained variational autoencoder, which is theoretically more
efficient than in the pixel-based space. Then, we combine the advantages of
MAEs and DPMs to design a progressive masking diffusion model, which gradually
increases the masking proportion by three different schedulers and reconstructs
the latent features from simple to difficult, without sequentially performing
denoising diffusion as in DPMs or using fixed high masking ratio as in MAEs, so
as to alleviate the high training time-consumption predicament. Our approach
allows for learning high-capacity models and accelerate their training (by 3x
or more) and barely reduces the original accuracy. Inference speed in
downstream tasks also significantly outperforms the previous approaches.
Related papers
- MAN: Latent Diffusion Enhanced Multistage Anti-Noise Network for Efficient and High-Quality Low-Dose CT Image Denoising [8.912550844312177]
We introduce MAN, a Latent Diffusion Enhanced Multistage Anti-Noise Network for Efficient and High-Quality Low-Dose CT Image Denoising task.<n>Our method operates in a compressed latent space via a perceptually-optimized autoencoder.<n>Our work demonstrates a practical path forward for advanced generative models in medical imaging.
arXiv Detail & Related papers (2025-09-28T03:13:39Z) - LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter [52.93785843453579]
Blind face restoration from low-quality (LQ) images is a challenging task that requires high-fidelity image reconstruction and the preservation of facial identity.<n>We propose LAFR, a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts.<n>We show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-29T14:11:16Z) - ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration [75.0053551643052]
We introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration.
ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens.
ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.
arXiv Detail & Related papers (2025-04-11T14:49:52Z) - TD-BFR: Truncated Diffusion Model for Efficient Blind Face Restoration [17.79398314291093]
We propose a novel Truncated Diffusion model for efficient Blind Face Restoration (TD-BFR)
TD-BFR utilizes an innovative truncated sampling method, starting from low-quality (LQ) images at low resolution to enhance sampling speed.
Our method efficiently restores high-quality images in a coarse-to-fine manner and experimental results demonstrate that TD-BFR is, on average, textbf4.75$times$ faster than current state-of-the-art diffusion-based BFR methods.
arXiv Detail & Related papers (2025-03-26T13:35:43Z) - Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution [7.920423405957888]
We propose a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks.
The proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step.
Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models.
arXiv Detail & Related papers (2025-03-25T09:56:21Z) - Masked Autoencoders Are Effective Tokenizers for Diffusion Models [56.08109308294133]
MAETok is an autoencoder that learns semantically rich latent space while maintaining reconstruction fidelity.
MaETok achieves significant practical improvements, enabling a gFID of 1.69 with 76x faster training and 31x higher inference throughput for 512x512 generation.
arXiv Detail & Related papers (2025-02-05T18:42:04Z) - AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation [12.564266865237343]
Latent diffusion models (LDMs) often experience significant structural distortions when directly generating high-resolution (HR) images.
We propose an Attentive and Progressive LDM (AP-LDM) aimed at enhancing HR image quality while accelerating the generation process.
AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising.
arXiv Detail & Related papers (2024-10-08T13:56:28Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [30.973473583364832]
DoSSR is a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models.
At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models.
Our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps.
arXiv Detail & Related papers (2024-09-26T12:16:11Z) - One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling.
OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder.
It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - ResShift: Efficient Diffusion Model for Image Super-resolution by
Residual Shifting [70.83632337581034]
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed.
We propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps.
Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual.
arXiv Detail & Related papers (2023-07-23T15:10:02Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Diffusion Probabilistic Model Made Slim [128.2227518929644]
We introduce a customized design for slim diffusion probabilistic models (DPM) for light-weight image synthesis.
We achieve 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks.
arXiv Detail & Related papers (2022-11-27T16:27:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.