Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
- URL: http://arxiv.org/abs/2503.13439v1
- Date: Mon, 17 Mar 2025 17:59:01 GMT
- Title: Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
- Authors: Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham,
- Abstract summary: Amodal3R is a conditional 3D generative model designed to reconstruct 3D objects from partial observations.<n>It learns to recover full 3D objects even in the presence of occlusions in real scenes.<n>It substantially outperforms existing methods that independently perform 2D amodal completion followed by 3D reconstruction.
- Score: 66.77399370856462
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most image-based 3D object reconstructors assume that objects are fully visible, ignoring occlusions that commonly occur in real-world scenarios. In this paper, we introduce Amodal3R, a conditional 3D generative model designed to reconstruct 3D objects from partial observations. We start from a "foundation" 3D generative model and extend it to recover plausible 3D geometry and appearance from occluded objects. We introduce a mask-weighted multi-head cross-attention mechanism followed by an occlusion-aware attention layer that explicitly leverages occlusion priors to guide the reconstruction process. We demonstrate that, by training solely on synthetic data, Amodal3R learns to recover full 3D objects even in the presence of occlusions in real scenes. It substantially outperforms existing methods that independently perform 2D amodal completion followed by 3D reconstruction, thereby establishing a new benchmark for occlusion-aware 3D reconstruction.
Related papers
- Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models [65.90387371072413]
We introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis.<n>At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views.
arXiv Detail & Related papers (2025-03-03T17:58:33Z) - Chirpy3D: Creative Fine-grained 3D Object Fabrication via Part Sampling [128.23917788822948]
Chirpy3D is a novel approach for fine-grained 3D object generation in a zero-shot setting.
The model must infer plausible 3D structures, capture fine-grained details, and generalize to novel objects.
Our experiments demonstrate that Chirpy3D surpasses existing methods in generating creative 3D objects with higher quality and fine-grained details.
arXiv Detail & Related papers (2025-01-07T21:14:11Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - In-Hand 3D Object Reconstruction from a Monocular RGB Video [17.31419675163019]
Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera.
Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object.
arXiv Detail & Related papers (2023-12-27T06:19:25Z) - Scan2LoD3: Reconstructing semantic 3D building models at LoD3 using ray
casting and Bayesian networks [40.7734793392562]
Reconstructing semantic 3D building models at the level of detail (LoD) 3 is a long-standing challenge.
We present a novel method, called Scan2LoD3, that accurately reconstructs semantic LoD3 building models.
We believe our method can foster the development of probability-driven semantic 3D reconstruction at LoD3.
arXiv Detail & Related papers (2023-05-10T17:01:18Z) - 3D Reconstruction of Objects in Hands without Real World 3D Supervision [12.70221786947807]
We propose modules to leverage 3D supervision to scale up the learning of models for reconstructing hand-held objects.
Specifically, we extract multiview 2D mask supervision from videos and 3D shape priors from shape collections.
We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image.
arXiv Detail & Related papers (2023-05-04T17:56:48Z) - Unsupervised Style-based Explicit 3D Face Reconstruction from Single
Image [10.1205208477163]
In this work, we propose a general adversarial learning framework for solving Unsupervised 2D to Explicit 3D Style Transfer.
Specifically, we merge two architectures: the unsupervised explicit 3D reconstruction network of Wu et al. and the Generative Adversarial Network (GAN) named StarGAN-v2.
We show that our solution is able to outperform well established solutions such as DepthNet in 3D reconstruction and Pix2NeRF in conditional style transfer.
arXiv Detail & Related papers (2023-04-24T21:25:06Z) - Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion [67.71624118802411]
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects.
We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data.
Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.
arXiv Detail & Related papers (2023-04-20T17:59:34Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Monocular 3D Object Reconstruction with GAN Inversion [122.96094885939146]
MeshInversion is a novel framework to improve the reconstruction of textured 3D meshes.
It exploits the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis.
Our framework obtains faithful 3D reconstructions with consistent geometry and texture across both observed and unobserved parts.
arXiv Detail & Related papers (2022-07-20T17:47:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.