FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
- URL: http://arxiv.org/abs/2403.12963v1
- Date: Tue, 19 Mar 2024 17:59:33 GMT
- Title: FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
- Authors: Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li,
- Abstract summary: We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
- Score: 48.9652334528436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions. To address this issue, we introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation, intending to achieve structural consistency and scale consistency across resolutions, respectively. Further enhanced by a padding-then-crop strategy, our method can flexibly handle text-to-image generation of various aspect ratios. By using the FouriScale as guidance, our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation. With its simplicity and compatibility, our method can provide valuable insights for future explorations into the synthesis of ultra-high-resolution images. The code will be released at https://github.com/LeonHLJ/FouriScale.
Related papers
- Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.
We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion [50.43304425256732]
FreeScale is a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion.
We extend the capabilities of higher-resolution visual generation for both image and video models.
arXiv Detail & Related papers (2024-12-12T18:59:59Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - Make a Cheap Scaling: A Self-Cascade Diffusion Model for
Higher-Resolution Adaptation [112.08287900261898]
This paper proposes a novel self-cascade diffusion model for rapid adaptation to higher-resolution image and video generation.
Our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters.
Experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.
arXiv Detail & Related papers (2024-02-16T07:48:35Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.