DFU: scale-robust diffusion model for zero-shot super-resolution image
generation
- URL: http://arxiv.org/abs/2401.06144v2
- Date: Mon, 22 Jan 2024 17:11:57 GMT
- Title: DFU: scale-robust diffusion model for zero-shot super-resolution image
generation
- Authors: Alex Havrilla, Kevin Rojas, Wenjing Liao, Molei Tao
- Abstract summary: We present a novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the score operator by combining both spatial and spectral information at multiple resolutions.
We propose a fine-tuning strategy to further enhance the zero-shot super-resolution image-generation capability of our model, leading to a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ.
- Score: 15.689418447376587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion generative models have achieved remarkable success in generating
images with a fixed resolution. However, existing models have limited ability
to generalize to different resolutions when training data at those resolutions
are not available. Leveraging techniques from operator learning, we present a
novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the
score operator by combining both spatial and spectral information at multiple
resolutions. Comparisons of DFU to baselines demonstrate its scalability: 1)
simultaneously training on multiple resolutions improves FID over training at
any single fixed resolution; 2) DFU generalizes beyond its training
resolutions, allowing for coherent, high-fidelity generation at
higher-resolutions with the same model, i.e. zero-shot super-resolution
image-generation; 3) we propose a fine-tuning strategy to further enhance the
zero-shot super-resolution image-generation capability of our model, leading to
a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ, which no
other method can come close to achieving.
Related papers
- I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow [50.55228067778858]
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency.
We introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs.
arXiv Detail & Related papers (2024-10-10T02:08:23Z) - Inverse design with conditional cascaded diffusion models [0.0]
Adjoint-based design optimizations are usually computationally expensive and those costs scale with resolution.
We extend the use of diffusion models over traditional generative models by proposing the conditional cascaded diffusion model (cCDM)
Our study compares cCDM against a cGAN model with transfer learning.
While both models show decreased performance with reduced high-resolution training data, the cCDM loses its superiority to the cGAN model with transfer learning when training data is limited.
arXiv Detail & Related papers (2024-08-16T04:54:09Z) - Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution [38.79439380482431]
Real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data.
Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training inputs.
We introduce a novel pairwise distance distillation framework to address the unsupervised RWSR for a targeted real-world degradation.
arXiv Detail & Related papers (2024-07-10T01:46:40Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes.
We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions.
We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.