AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation
- URL: http://arxiv.org/abs/2410.06055v1
- Date: Tue, 8 Oct 2024 13:56:28 GMT
- Title: AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation
- Authors: Boyuan Cao, Jiaxin Ye, Yujie Wei, Hongming Shan,
- Abstract summary: Latent diffusion models (LDMs) often experience significant structural distortions when directly generating high-resolution (HR) images.
We propose an Attentive and Progressive LDM (AP-LDM) aimed at enhancing HR image quality while accelerating the generation process.
AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising.
- Score: 12.564266865237343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent diffusion models (LDMs), such as Stable Diffusion, often experience significant structural distortions when directly generating high-resolution (HR) images that exceed their original training resolutions. A straightforward and cost-effective solution is to adapt pre-trained LDMs for HR image generation; however, existing methods often suffer from poor image quality and long inference time. In this paper, we propose an Attentive and Progressive LDM (AP-LDM), a novel, training-free framework aimed at enhancing HR image quality while accelerating the generation process. AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising. The first stage generates a latent representation of a higher-quality training-resolution image through the proposed attentive guidance, which utilizes a novel parameter-free self-attention mechanism to enhance the structural consistency. The second stage progressively performs upsampling in pixel space, alleviating the severe artifacts caused by latent space upsampling. Leveraging the effective initialization from the first stage enables denoising at higher resolutions with significantly fewer steps, enhancing overall efficiency. Extensive experimental results demonstrate that AP-LDM significantly outperforms state-of-the-art methods, delivering up to a 5x speedup in HR image generation, thereby highlighting its substantial advantages for real-world applications. Code is available at https://github.com/kmittle/AP-LDM.
Related papers
- Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution [7.920423405957888]
We propose a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks.
The proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step.
Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models.
arXiv Detail & Related papers (2025-03-25T09:56:21Z) - Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images [7.920423405957888]
E$2$DiffSR achieves superior objective metrics and visual quality compared to the state-of-the-art SR methods.
It reduces the inference time of diffusion-based SR methods to a level comparable to that of non-diffusion methods.
arXiv Detail & Related papers (2024-10-30T09:14:13Z) - A Flow-based Truncated Denoising Diffusion Model for Super-resolution Magnetic Resonance Spectroscopic Imaging [34.32290273033808]
This work introduces a Flow-based Truncated Denoising Diffusion Model for super-resolution MRSI.
It shortens the diffusion process by truncating the diffusion chain, and the truncated steps are estimated using a normalizing flow-based network.
We demonstrate that FTDDM outperforms existing generative models while speeding up the sampling process by over 9-fold.
arXiv Detail & Related papers (2024-10-25T03:42:35Z) - Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors [75.24313405671433]
Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors.
We introduce a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods.
Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR.
arXiv Detail & Related papers (2024-09-25T16:15:21Z) - One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling.
OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder.
It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - ACDMSR: Accelerated Conditional Diffusion Models for Single Image
Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR.
Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process.
Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z) - Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution.
IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework.
The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z) - Real-World Image Super-Resolution by Exclusionary Dual-Learning [98.36096041099906]
Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input.
Deep learning-based methods have achieved promising restoration quality on real-world image super-resolution datasets.
We propose Real-World image Super-Resolution by Exclusionary Dual-Learning (RWSR-EDL) to address the feature diversity in perceptual- and L1-based cooperative learning.
arXiv Detail & Related papers (2022-06-06T13:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.