Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
- URL: http://arxiv.org/abs/2503.09830v1
- Date: Wed, 12 Mar 2025 20:51:17 GMT
- Title: Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
- Authors: Feng Zhou, Pu Cao, Yiyang Ma, Lu Yang, Jianqin Yin,
- Abstract summary: Denoising higher-resolution latents via a pre-trained U-Net leads to repetitive and disordered image patterns.<n>We propose a novel training-free approach, introducing a Progressive Boundary Complement (PBC) method.<n>This method creates dynamic virtual image boundaries inside the feature map to enhance position information propagation.
- Score: 12.46680249926721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Denoising higher-resolution latents via a pre-trained U-Net leads to repetitive and disordered image patterns. Although recent studies make efforts to improve generative quality by aligning denoising process across original and higher resolutions, the root cause of suboptimal generation is still lacking exploration. Through comprehensive analysis of position encoding in U-Net, we attribute it to inconsistent position encoding, sourced by the inadequate propagation of position information from zero-padding to latent features in convolution layers as resolution increases. To address this issue, we propose a novel training-free approach, introducing a Progressive Boundary Complement (PBC) method. This method creates dynamic virtual image boundaries inside the feature map to enhance position information propagation, enabling high-quality and rich-content high-resolution image synthesis. Extensive experiments demonstrate the superiority of our method.
Related papers
- HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution [16.652558917081954]
We propose textbfHIIF for continuous image super-resolution.<n>Our approach embeds a multi-head linear attention mechanism within the implicit attention network by taking additional non-local information into account.<n>Our experiments show that, when integrated with different backbone encoders, HIIF outperforms the state-of-the-art continuous image super-resolution methods by up to 0.17dB in PSNR.
arXiv Detail & Related papers (2024-12-04T22:35:20Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - High Perceptual Quality Wireless Image Delivery with Denoising Diffusion
Models [10.763194436114194]
We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC)
We introduce a novel scheme that utilizes the range-null space decomposition of the target image.
We demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method.
arXiv Detail & Related papers (2023-09-27T16:30:59Z) - Soft-IntroVAE for Continuous Latent space Image Super-Resolution [12.344557879284219]
We propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR)
Inspired by Variational AutoEncoder, we propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR)
arXiv Detail & Related papers (2023-07-18T06:54:42Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Image Harmonization with Region-wise Contrastive Learning [51.309905690367835]
We propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme.
Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles.
arXiv Detail & Related papers (2022-05-27T15:46:55Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - SDWNet: A Straight Dilated Network with Wavelet Transformation for Image
Deblurring [23.86692375792203]
Image deblurring is a computer vision problem that aims to recover a sharp image from a blurred image.
Our model uses dilated convolution to enable the obtainment of the large receptive field with high spatial resolution.
We propose a novel module using the wavelet transform, which effectively helps the network to recover clear high-frequency texture details.
arXiv Detail & Related papers (2021-10-12T07:58:10Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.