Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
- URL: http://arxiv.org/abs/2410.18830v2
- Date: Sun, 06 Apr 2025 16:44:55 GMT
- Title: Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
- Authors: Xiaoyu Zhang, Teng Zhou, Xinlong Zhang, Jia Wei, Yongchuan Tang,
- Abstract summary: This paper introduces the Multi-Scale Diffusion (MSD), an optimized framework that extends the panoramic image generation framework to multiple resolution levels.<n>Our method leverages gradient descent techniques to incorporate structural information from low-resolution images into high-resolution outputs.
- Score: 12.588962705218103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently gained recognition for generating diverse and high-quality content, especially in image synthesis. These models excel not only in creating fixed-size images but also in producing panoramic images. However, existing methods often struggle with spatial layout consistency when producing high-resolution panoramas due to the lack of guidance on the global image layout. This paper introduces the Multi-Scale Diffusion (MSD), an optimized framework that extends the panoramic image generation framework to multiple resolution levels. Our method leverages gradient descent techniques to incorporate structural information from low-resolution images into high-resolution outputs. Through comprehensive qualitative and quantitative evaluations against prior work, we demonstrate that our approach significantly improves the coherence of high-resolution panorama generation.
Related papers
- Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z) - Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.
Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.
SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z) - Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks [36.61645124563195]
We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions.
We use semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images.
Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images.
arXiv Detail & Related papers (2024-07-02T11:02:19Z) - ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance [46.64836025290448]
ResMaster is a training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions.
It provides structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis.
Experiments validate that ResMaster sets a new benchmark for high-resolution image generation and demonstrates promising efficiency.
arXiv Detail & Related papers (2024-06-24T09:28:21Z) - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - Generative Powers of Ten [60.6740997942711]
We present a method that uses a text-to-image model to generate consistent content across multiple image scales.
We achieve this through a joint multi-scale diffusion sampling approach.
Our method enables deeper levels of zoom than traditional super-resolution methods.
arXiv Detail & Related papers (2023-12-04T18:59:25Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - HORIZON: High-Resolution Semantically Controlled Panorama Synthesis [105.55531244750019]
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds.
Recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, but a direct application of these methods to panorama synthesis yields distorted content.
We unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling.
arXiv Detail & Related papers (2022-10-10T09:43:26Z) - Adaptive Single Image Deblurring [43.02281823557039]
We propose an efficient pixel adaptive and feature attentive design for handling large blur variations within and across different images.
We also propose an effective content-aware global-local filtering module that significantly improves the performance.
arXiv Detail & Related papers (2022-01-01T10:10:19Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - A Generative Model for Hallucinating Diverse Versions of Super
Resolution Images [0.3222802562733786]
We are tackling in this work the problem of obtaining different high-resolution versions from the same low-resolution image using Generative Adversarial Models.
Our learning approach makes use of high frequencies available in the training high-resolution images for preserving and exploring in an unsupervised manner.
arXiv Detail & Related papers (2021-02-12T17:11:42Z) - Efficient texture-aware multi-GAN for image inpainting [5.33024001730262]
Recent GAN-based (Generative adversarial networks) inpainting methods show remarkable improvements.
We propose a multi-GAN architecture improving both the performance and rendering efficiency.
arXiv Detail & Related papers (2020-09-30T14:58:03Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.