SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
- URL: http://arxiv.org/abs/2403.10044v1
- Date: Fri, 15 Mar 2024 06:26:46 GMT
- Title: SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
- Authors: Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li,
- Abstract summary: Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.
In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges.
Experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
- Score: 63.685132323224124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images.For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images.Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion.For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic.Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images.With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
Related papers
- AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction [55.69271635843385]
We present AniSDF, a novel approach that learns fused-granularity neural surfaces with physics-based encoding for high-fidelity 3D reconstruction.
Our method boosts the quality of SDF-based methods by a great scale in both geometry reconstruction and novel-view synthesis.
arXiv Detail & Related papers (2024-10-02T03:10:38Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - Pixelated Reconstruction of Foreground Density and Background Surface
Brightness in Gravitational Lensing Systems using Recurrent Inference
Machines [116.33694183176617]
We use a neural network based on the Recurrent Inference Machine to reconstruct an undistorted image of the background source and the lens mass density distribution as pixelated maps.
When compared to more traditional parametric models, the proposed method is significantly more expressive and can reconstruct complex mass distributions.
arXiv Detail & Related papers (2023-01-10T19:00:12Z) - Generative Deformable Radiance Fields for Disentangled Image Synthesis
of Topology-Varying Objects [52.46838926521572]
3D-aware generative models have demonstrated their superb performance to generate 3D neural radiance fields (NeRF) from a collection of monocular 2D images.
We propose a generative model for synthesizing radiance fields of topology-varying objects with disentangled shape and appearance variations.
arXiv Detail & Related papers (2022-09-09T08:44:06Z) - NeurInt : Learning to Interpolate through Neural ODEs [18.104328632453676]
We propose a novel generative model that learns a distribution of trajectories between two images.
We demonstrate our approach's effectiveness in generating images improved quality as well as its ability to learn a diverse distribution over smooth trajectories for any pair of real source and target images.
arXiv Detail & Related papers (2021-11-07T16:31:18Z) - Disentangling Geometric Deformation Spaces in Generative Latent Shape
Models [5.582957809895198]
A complete representation of 3D objects requires characterizing the space of deformations in an interpretable manner.
We improve on a prior generative model of disentanglement for 3D shapes, wherein the space of object geometry is factorized into rigid orientation, non-rigid pose, and intrinsic shape.
The resulting model can be trained from raw 3D shapes, without correspondences, labels, or even rigid alignment.
arXiv Detail & Related papers (2021-02-27T06:54:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.