iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse
Views
- URL: http://arxiv.org/abs/2312.17250v1
- Date: Thu, 28 Dec 2023 18:59:57 GMT
- Title: iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse
Views
- Authors: Chin-Hsuan Wu, Yen-Chun Chen, Bolivar Solarte, Lu Yuan, Min Sun
- Abstract summary: iFusion is a novel 3D object reconstruction framework that requires only two views with unknown camera poses.
We harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects.
Experiments demonstrate strong performance in both pose estimation and novel view synthesis.
- Score: 61.707755434165335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present iFusion, a novel 3D object reconstruction framework that requires
only two views with unknown camera poses. While single-view reconstruction
yields visually appealing results, it can deviate significantly from the actual
object, especially on unseen sides. Additional views improve reconstruction
fidelity but necessitate known camera poses. However, assuming the availability
of pose may be unrealistic, and existing pose estimators fail in sparse view
scenarios. To address this, we harness a pre-trained novel view synthesis
diffusion model, which embeds implicit knowledge about the geometry and
appearance of diverse objects. Our strategy unfolds in three steps: (1) We
invert the diffusion model for camera pose estimation instead of synthesizing
novel views. (2) The diffusion model is fine-tuned using provided views and
estimated poses, turned into a novel view synthesizer tailored for the target
object. (3) Leveraging registered views and the fine-tuned diffusion model, we
reconstruct the 3D object. Experiments demonstrate strong performance in both
pose estimation and novel view synthesis. Moreover, iFusion seamlessly
integrates with various reconstruction methods and enhances them.
Related papers
- SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views [36.02533658048349]
We propose a novel method, SpaRP, to reconstruct a 3D textured mesh and estimate the relative camera poses for sparse-view images.
SpaRP distills knowledge from 2D diffusion models and finetunes them to implicitly deduce the 3D spatial relationships between the sparse views.
It requires only about 20 seconds to produce a textured mesh and camera poses for the input views.
arXiv Detail & Related papers (2024-08-19T17:53:10Z) - FSViewFusion: Few-Shots View Generation of Novel Objects [75.81872204650807]
We introduce a pretrained stable diffusion model for view synthesis without explicit 3D priors.
Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots.
We establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
arXiv Detail & Related papers (2024-03-11T02:59:30Z) - Extreme Two-View Geometry From Object Poses with Diffusion Models [21.16779160086591]
We harness the power of object priors to accurately determine two-view geometry in the face of extreme viewpoint changes.
In experiments, our method has demonstrated extraordinary robustness and resilience to large viewpoint changes.
arXiv Detail & Related papers (2024-02-05T08:18:47Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.