CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
- URL: http://arxiv.org/abs/2407.07174v1
- Date: Tue, 9 Jul 2024 18:41:12 GMT
- Title: CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
- Authors: Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang,
- Abstract summary: This paper introduces Camera-free Diffusion model for 360-degree image outpainting from a single camera-free image and text description.
Our model incorporates a mechanism for predicting homography directly within the multi-view diffusion framework.
- Score: 12.38275663977654
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusion framework. The core of our approach is to formulate camera estimation by predicting the homography transformation from the input view to a predefined canonical view. The homography provides point-level correspondences between the input image and targeting panoramic images, allowing connections enforced by correspondence-aware attention in a fully differentiable manner. Qualitative and quantitative experimental results demonstrate our model's strong robustness and generalization ability for 360-degree image outpainting in the challenging context of camera-free inputs.
Related papers
- Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.
We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.
By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis [18.64688172651478]
We present DiffPortrait3D, a conditional diffusion model capable of synthesizing 3D-consistent photo-realistic novel views.
Given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views.
We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.
arXiv Detail & Related papers (2023-12-20T13:31:11Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Inverting the Imaging Process by Learning an Implicit Camera Model [73.81635386829846]
This paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network.
We demonstrate the power of this new implicit camera model on two inverse imaging tasks.
arXiv Detail & Related papers (2023-04-25T11:55:03Z) - Long-Term Photometric Consistent Novel View Synthesis with Diffusion
Models [24.301334966272297]
We propose a novel generative model capable of producing a sequence of photorealistic images consistent with a specified camera trajectory.
To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED)
arXiv Detail & Related papers (2023-04-21T02:01:02Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.