Related papers: Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation

Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation

URL: http://arxiv.org/abs/2412.12771v2
Date: Mon, 10 Feb 2025 18:55:08 GMT
Title: Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation
Authors: Shoukun Sun, Min Xian, Tiankai Yao, Fei Xu, Luca Capriotti,
Abstract summary: A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches.<n>Results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles.<n>We propose Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions.<n>We also propose Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model.
Score: 2.3141583665677503
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Producing large images using small diffusion models is gaining increasing popularity, as the cost of training large models could be prohibitive. A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches. However, results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles. To address the issues, we proposed Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions. Moreover, we proposed Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA), which generates a coherent style for large images by adjusting the initial input noise without adding extra computational burden. Extensive experiments demonstrated that the proposed fusion methods improved the quality of the generated image significantly. The proposed method can be widely applied as a plug-and-play module to enhance other fusion-based methods for large image generation. Code: https://github.com/TitorX/GVCFDiffusion

Related papers

Reversible Efficient Diffusion for Image Fusion [66.35113261837469]
Multi-modal image fusion aims to consolidate complementary information from diverse source images into a unified representation.<n>While diffusion models have demonstrated impressive generative capabilities in image generation, they often suffer from detail loss when applied to image fusion tasks.<n>This issue arises from the accumulation of noise errors inherent in the Markov process, leading to inconsistency and degradation in the fused results.<n>We propose the Reversible Efficient Diffusion (RED) model - an explicitly supervised training framework that inherits the powerful generative capability of diffusion models while avoiding the distribution estimation.
arXiv Detail & Related papers (2026-01-28T05:14:55Z)
Efficient Rectified Flow for Image Fusion [48.330480065862474]
We propose RFfusion, an efficient one-step diffusion model for image fusion based on Rectified Flow.<n>We also propose a task-specific variational autoencoder architecture tailored for image fusion.<n>Our method outperforms other state-of-the-art methods in terms of both inference speed and fusion quality.
arXiv Detail & Related papers (2025-09-20T06:21:00Z)
Causality-Driven Infrared and Visible Image Fusion [7.454657847653563]
This paper re-examines the image fusion task from the causality perspective.<n>It disentangles the model from the impact of bias by constructing a tailored causal graph.<n>Back-door Adjustment based Feature Fusion Module (BAFFM) is proposed to eliminate confounder interference.
arXiv Detail & Related papers (2025-05-27T07:48:52Z)
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.17975553762582286]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax. We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model. We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z)
DiffHarmony: Latent Diffusion Model Meets Image Harmonization [11.500358677234939]
Diffusion models have promoted the rapid development of image-to-image translation tasks. Fine-tuning pre-trained latent diffusion models from scratch is computationally intensive. In this paper, we adapt a pre-trained latent diffusion model to the image harmonization task to generate harmonious but potentially blurry initial images.
arXiv Detail & Related papers (2024-04-09T09:05:23Z)
Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes. For many applications such as image editing, the model input comes from a distribution that is not random noise. In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z)
Generation and Recombination for Multifocus Image Fusion with Free Number of Inputs [17.32596568119519]
Multifocus image fusion is an effective way to overcome the limitation of optical lenses. Previous methods assume that the focused areas of the two source images are complementary, making it impossible to achieve simultaneous fusion of multiple images. In GRFusion, focus property detection of each source image can be implemented independently, enabling simultaneous fusion of multiple source images.
arXiv Detail & Related papers (2023-09-09T01:47:56Z)
Improving Misaligned Multi-modality Image Fusion with One-stage Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion. We propose a Cross-modality Multi-scale Progressive Dense Registration scheme. This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM) Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
The Power of Triply Complementary Priors for Image Compressive Sensing [89.14144796591685]
We propose a joint low-rank deep (LRD) image model, which contains a pair of complementaryly trip priors. We then propose a novel hybrid plug-and-play framework based on the LRD model for image CS. To make the optimization tractable, a simple yet effective algorithm is proposed to solve the proposed H-based image CS problem.
arXiv Detail & Related papers (2020-05-16T08:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.