Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views
- URL: http://arxiv.org/abs/2412.08412v2
- Date: Thu, 12 Dec 2024 05:36:51 GMT
- Title: Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views
- Authors: Songchun Zhang, Chunhui Zhao,
- Abstract summary: Inferring 3D structures from sparse, unposed observations is challenging due to its unconstrained nature.
Recent methods propose to predict implicit representations directly from unposed inputs in a data-driven manner, achieving promising results.
We propose conditional novel view synthesis, aiming to generate complete observations from limited input views to facilitate reconstruction.
- Score: 23.94629999419033
- License:
- Abstract: Inferring 3D structures from sparse, unposed observations is challenging due to its unconstrained nature. Recent methods propose to predict implicit representations directly from unposed inputs in a data-driven manner, achieving promising results. However, these methods do not utilize geometric priors and cannot hallucinate the appearance of unseen regions, thus making it challenging to reconstruct fine geometric and textural details. To tackle this challenge, our key idea is to reformulate this ill-posed problem as conditional novel view synthesis, aiming to generate complete observations from limited input views to facilitate reconstruction. With complete observations, the poses of the input views can be easily recovered and further used to optimize the reconstructed object. To this end, we propose a novel pipeline Pragmatist. First, we generate a complete observation of the object via a multiview conditional diffusion model. Then, we use a feed-forward large reconstruction model to obtain the reconstructed mesh. To further improve the reconstruction quality, we recover the poses of input views by inverting the obtained 3D representations and further optimize the texture using detailed input views. Unlike previous approaches, our pipeline improves reconstruction by efficiently leveraging unposed inputs and generative priors, circumventing the direct resolution of highly ill-posed problems. Extensive experiments show that our approach achieves promising performance in several benchmarks.
Related papers
- Hyperbolic-constraint Point Cloud Reconstruction from Single RGB-D Images [19.23499128175523]
We introduce hyperbolic space to 3D point cloud reconstruction, enabling the model to represent and understand complex hierarchical structures in point clouds with low distortion.
Our model outperforms most existing models, and ablation studies demonstrate the significance of our model and its components.
arXiv Detail & Related papers (2024-12-12T08:27:39Z) - PVP-Recon: Progressive View Planning via Warping Consistency for Sparse-View Surface Reconstruction [49.7580491592023]
We propose PVP-Recon, a novel and effective sparse-view surface reconstruction method.
PVP-Recon starts initial surface reconstruction with as few as 3 views and progressively adds new views.
This progressive view planning progress is interleaved with a neural SDF-based reconstruction module.
arXiv Detail & Related papers (2024-09-09T10:06:34Z) - Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction [57.805334118057665]
We propose REFINE, a postprocessing mesh refinement step that can be easily integrated into the pipeline of any black-box method in the literature.
At test time, REFINE optimize a network per mesh instance, to encourage consistency between the mesh and the given object view.
arXiv Detail & Related papers (2021-08-23T03:28:47Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Next-best-view Regression using a 3D Convolutional Neural Network [0.9449650062296823]
We propose a data-driven approach to address the next-best-view problem.
The proposed approach trains a 3D convolutional neural network with previous reconstructions in order to regress the btxtposition of the next-best-view.
We have validated the proposed approach making use of two groups of experiments.
arXiv Detail & Related papers (2021-01-23T01:50:26Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Reconstruct, Rasterize and Backprop: Dense shape and pose estimation
from a single image [14.9851111159799]
This paper presents a new system to obtain dense object reconstructions along with 6-DoF poses from a single image.
We leverage recent advances in differentiable rendering (in particular, robotics) to close the loop with 3D reconstruction in camera frame.
arXiv Detail & Related papers (2020-04-25T20:53:43Z) - Monocular Human Pose and Shape Reconstruction using Part Differentiable
Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth.
In this paper, we introduce body segmentation as critical supervision.
To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.