Related papers: ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

URL: http://arxiv.org/abs/2406.16476v1
Date: Mon, 24 Jun 2024 09:28:21 GMT
Title: ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng,
Abstract summary: ResMaster is a training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. It provides structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. Experiments validate that ResMaster sets a new benchmark for high-resolution image generation and demonstrates promising efficiency.
Score: 46.64836025290448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMaster leverages a low-resolution reference image created by a pre-trained diffusion model to provide structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. To ensure a coherent global structure, ResMaster meticulously aligns the low-frequency components of high-resolution patches with the low-resolution reference at each denoising step. For fine-grained guidance, tailored image prompts based on the low-resolution reference and enriched textual prompts produced by a vision-language model are incorporated. This approach could significantly mitigate local pattern distortions and improve detail refinement. Extensive experiments validate that ResMaster sets a new benchmark for high-resolution image generation and demonstrates promising efficiency. The project page is https://shuweis.github.io/ResMaster .

Related papers

Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration [75.51789992466183]
TAMambaIR simultaneously perceives image textures achieves and a trade-off between performance and efficiency.<n>Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency.
arXiv Detail & Related papers (2025-01-27T23:53:49Z)
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion [50.43304425256732]
FreeScale is a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion. We extend the capabilities of higher-resolution visual generation for both image and video models.
arXiv Detail & Related papers (2024-12-12T18:59:59Z)
High-Resolution Be Aware! Improving the Self-Supervised Real-World Super-Resolution [37.546746047196486]
Self-supervised learning is crucial for super-resolution because ground-truth images are usually unavailable for real-world settings. Existing methods derive self-supervision from low-resolution images by creating pseudo-pairs or by enforcing a low-resolution reconstruction objective. This paper strengthens awareness of the high-resolution image to improve the self-supervised real-world super-resolution.
arXiv Detail & Related papers (2024-11-25T08:13:32Z)
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation [12.588962705218103]
We introduce the Multi-Scale Diffusion (MSD) framework, a plug-and-play module that extends the existing panoramic image generation framework to multiple resolution levels. By utilizing gradient descent techniques, our method effectively incorporates structural information from low-resolution images into high-resolution outputs.
arXiv Detail & Related papers (2024-10-24T15:18:51Z)
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts [77.62320553269615]
HiPrompt is a tuning-free solution for higher-resolution image generation. hierarchical prompts offer both global and local guidance. generated images maintain coherent local and global semantics, structures, and textures with high definition.
arXiv Detail & Related papers (2024-09-04T17:58:08Z)
Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model. We utilize the pre-trained visual language model to extract visual prompts from degraded images. We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z)
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks [36.61645124563195]
We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions. We use semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images. Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images.
arXiv Detail & Related papers (2024-07-02T11:02:19Z)
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation. Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z)
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues. We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z)
Efficient texture-aware multi-GAN for image inpainting [5.33024001730262]
Recent GAN-based (Generative adversarial networks) inpainting methods show remarkable improvements. We propose a multi-GAN architecture improving both the performance and rendering efficiency.
arXiv Detail & Related papers (2020-09-30T14:58:03Z)
Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images. We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)
Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately. By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.