Refusion: Enabling Large-Size Realistic Image Restoration with
Latent-Space Diffusion Models
- URL: http://arxiv.org/abs/2304.08291v1
- Date: Mon, 17 Apr 2023 14:06:49 GMT
- Title: Refusion: Enabling Large-Size Realistic Image Restoration with
Latent-Space Diffusion Models
- Authors: Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj\"olund, Thomas
B. Sch\"on
- Abstract summary: We enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and perceptual/scheduler scores.
We also propose a U-Net based latent diffusion model which performs diffusion in a low-resolution latent space while preserving high-resolution information from the original input for the decoding process.
These modifications allow us to apply diffusion models to various image restoration tasks, including real-world shadow removal, HR non-homogeneous dehazing, stereo super-resolution, and bokeh effect transformation.
- Score: 9.245782611878752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work aims to improve the applicability of diffusion models in realistic
image restoration. Specifically, we enhance the diffusion model in several
aspects such as network architecture, noise level, denoising steps, training
image size, and optimizer/scheduler. We show that tuning these hyperparameters
allows us to achieve better performance on both distortion and perceptual
scores. We also propose a U-Net based latent diffusion model which performs
diffusion in a low-resolution latent space while preserving high-resolution
information from the original input for the decoding process. Compared to the
previous latent-diffusion model which trains a VAE-GAN to compress the image,
our proposed U-Net compression strategy is significantly more stable and can
recover highly accurate images without relying on adversarial optimization.
Importantly, these modifications allow us to apply diffusion models to various
image restoration tasks, including real-world shadow removal, HR
non-homogeneous dehazing, stereo super-resolution, and bokeh effect
transformation. By simply replacing the datasets and slightly changing the
noise network, our model, named Refusion, is able to deal with large-size
images (e.g., 6000 x 4000 x 3 in HR dehazing) and produces good results on all
the above restoration problems. Our Refusion achieves the best perceptual
performance in the NTIRE 2023 Image Shadow Removal Challenge and wins 2nd place
overall.
Related papers
- Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model.
We utilize the pre-trained visual language model to extract visual prompts from degraded images.
We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z) - Blind Image Restoration via Fast Diffusion Inversion [17.139433082780037]
Blind Image Restoration via fast Diffusion (BIRD) is a blind IR method that jointly optimize for the degradation model parameters and the restored image.
A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled.
We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance on all of them.
arXiv Detail & Related papers (2024-05-29T23:38:12Z) - Lossy Image Compression with Foundation Diffusion Models [10.407650300093923]
In this work we formulate the removal of quantization error as a denoising task, using diffusion to recover lost information in the transmitted image latent.
Our approach allows us to perform less than 10% of the full diffusion generative process and requires no architectural changes to the diffusion model.
arXiv Detail & Related papers (2024-04-12T16:23:42Z) - Make a Cheap Scaling: A Self-Cascade Diffusion Model for
Higher-Resolution Adaptation [112.08287900261898]
This paper proposes a novel self-cascade diffusion model for rapid adaptation to higher-resolution image and video generation.
Our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters.
Experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.
arXiv Detail & Related papers (2024-02-16T07:48:35Z) - HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models [13.68666823175341]
HiDiffusion is a tuning-free higher-resolution framework for image synthesis.
RAU-Net dynamically adjusts the feature map size to resolve object duplication.
MSW-MSA engages optimized window attention to reduce computations.
arXiv Detail & Related papers (2023-11-29T11:01:38Z) - Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of
Experts And Frequency-augmented Decoder Approach [17.693287544860638]
latent-based diffusion for image super-resolution improved by pre-trained text-image models.
latent-based methods utilize a feature encoder to transform the image and then implement the SR image generation in a compact latent space.
We propose a frequency compensation module that enhances the frequency components from latent space to pixel space.
arXiv Detail & Related papers (2023-10-18T14:39:25Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.