StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
- URL: http://arxiv.org/abs/2501.15445v1
- Date: Sun, 26 Jan 2025 08:22:44 GMT
- Title: StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
- Authors: Kyeongmin Yeo, Jaihoon Kim, Minhyuk Sung,
- Abstract summary: We propose a method for generating images in arbitrary spaces using a pretrained image diffusion model.
The zero-shot method combines the strengths of both image conditioning and 3D mesh-based methods.
- Score: 11.517082612850443
- License:
- Abstract: We propose a zero-shot method for generating images in arbitrary spaces (e.g., a sphere for 360{\deg} panoramas and a mesh surface for texture) using a pretrained image diffusion model. The zero-shot generation of various visual content using a pretrained image diffusion model has been explored mainly in two directions. First, Diffusion Synchronization-performing reverse diffusion processes jointly across different projected spaces while synchronizing them in the target space-generates high-quality outputs when enough conditioning is provided, but it struggles in its absence. Second, Score Distillation Sampling-gradually updating the target space data through gradient descent-results in better coherence but often lacks detail. In this paper, we reveal for the first time the interconnection between these two methods while highlighting their differences. To this end, we propose StochSync, a novel approach that combines the strengths of both, enabling effective performance with weak conditioning. Our experiments demonstrate that StochSync provides the best performance in 360{\deg} panorama generation (where image conditioning is not given), outperforming previous finetuning-based methods, and also delivers comparable results in 3D mesh texturing (where depth conditioning is provided) with previous methods.
Related papers
- ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [5.55656676725821]
We present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them.
Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape.
In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction.
arXiv Detail & Related papers (2025-02-13T12:49:25Z) - SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.
Recent works have diverged into two directions: regression-based modeling and generative modeling.
We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z) - Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching [15.843208029973175]
We propose a synchronous diffusion process which we use as regularisation to achieve smoothness in non-rigid 3D shape matching problems.
We demonstrate that our novel regularisation can substantially improve the state-of-the-art in shape matching, especially in the presence of topological noise.
arXiv Detail & Related papers (2024-07-11T07:45:06Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - SyncTweedies: A General Generative Framework Based on Synchronized Diffusions [11.292617528150291]
We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space.
We reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces.
In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods.
arXiv Detail & Related papers (2024-03-21T12:57:30Z) - AdaDiff: Adaptive Step Selection for Fast Diffusion Models [82.78899138400435]
We introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies.
AdaDiff is optimized using a policy method to maximize a carefully designed reward function.
We conduct experiments on three image generation and two video generation benchmarks and demonstrate that our approach achieves similar visual quality compared to the baseline.
arXiv Detail & Related papers (2023-11-24T11:20:38Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D
Reconstruction [81.98244738773766]
We present CCD-3DR, which exploits a novel centered diffusion probabilistic model for consistent local feature conditioning.
CCD-3DR outperforms all competitors by a large margin, with over 40% improvement.
arXiv Detail & Related papers (2023-08-15T15:27:42Z) - SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions [14.48564620768044]
naive stitching of multiple images often results in visible seams.
Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows.
We propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss.
arXiv Detail & Related papers (2023-06-08T13:18:23Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.