Related papers: SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

URL: http://arxiv.org/abs/2306.05178v3
Date: Sun, 29 Oct 2023 06:11:24 GMT
Title: SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
Authors: Yuseung Lee, Kunho Kim, Hyunjin Kim, Minhyuk Sung
Abstract summary: naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows. We propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss.
Score: 14.48564620768044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score). We further demonstrate the versatility of our method across three plug-and-play applications: layout-guided image generation, conditional image generation and 360-degree panorama generation. Our project page is at https://syncdiffusion.github.io.

Related papers

StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces [11.517082612850443]
We propose a method for generating images in arbitrary spaces using a pretrained image diffusion model. The zero-shot method combines the strengths of both image conditioning and 3D mesh-based methods.
arXiv Detail & Related papers (2025-01-26T08:22:44Z)
VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models [21.584843961386888]
VipDiff is a framework for conditioning diffusion model on the reverse diffusion process to produce temporal-coherent inpainting results. It can largely outperform state-of-the-art video inpainting methods in terms of both spatial-temporal coherence and fidelity.
arXiv Detail & Related papers (2025-01-21T16:39:09Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints. During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z)
SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time [7.532695984765271]
We present a novel approach to generate high-resolution images with generative models. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. Our method offers several key benefits, including improved computational efficiency and faster inference times.
arXiv Detail & Related papers (2024-07-22T09:44:35Z)
Preserving Image Properties Through Initializations in Diffusion Models [6.804700416902898]
We show that Stable Diffusion methods, as currently applied, do not respect requirements of retail photography. The usual practice of training the denoiser with a very noisy image leads to inconsistent generated images during inference. A network trained with centered retail product images with uniform backgrounds generates images with erratic backgrounds. Our procedure can interact well with other control-based methods to further enhance the controllability of diffusion-based methods.
arXiv Detail & Related papers (2024-01-04T06:55:49Z)
AdaDiff: Adaptive Step Selection for Fast Diffusion [88.8198344514677]
We introduce AdaDiff, a framework designed to learn instance-specific step usage policies. AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function. Our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps.
arXiv Detail & Related papers (2023-11-24T11:20:38Z)
Collaborative Score Distillation for Consistent Visual Synthesis [70.29294250371312]
Collaborative Score Distillation (CSD) is based on the Stein Variational Gradient Descent (SVGD) We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
arXiv Detail & Related papers (2023-07-04T17:31:50Z)
Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework. Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z)
Exposure Fusion for Hand-held Camera Inputs with Optical Flow and PatchMatch [53.149395644547226]
We propose a hybrid synthesis method for multi-exposure image fusion taken by hand-held cameras. Our method can deal with such motions and maintain the exposure information of each input effectively. Experiment results demonstrate the effectiveness and robustness of our method.
arXiv Detail & Related papers (2023-04-10T09:06:37Z)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z)
Conffusion: Confidence Intervals for Diffusion Models [32.36217153362305]
Current diffusion-based methods do not provide statistical guarantees regarding the generated results. We propose Conffusion, wherein we fine-tune a pre-trained diffusion model to predict interval bounds in a single forward pass. We show that Conffusion outperforms the baseline method while being three orders of magnitude faster.
arXiv Detail & Related papers (2022-11-17T18:58:15Z)
Markup-to-Image Diffusion Models with Scheduled Sampling [111.30188533324954]
Building on recent advances in image generation, we present a data-driven approach to rendering markup into images. The approach is based on diffusion models, which parameterize the distribution of data using a sequence of denoising operations. We conduct experiments on four markup datasets: mathematical formulas (La), table layouts (HTML), sheet music (LilyPond), and molecular images (SMILES)
arXiv Detail & Related papers (2022-10-11T04:56:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.