Related papers: AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views

AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views

URL: http://arxiv.org/abs/2511.21945v1
Date: Wed, 26 Nov 2025 22:11:56 GMT
Title: AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views
Authors: Junwei Zhou, Yu-Wing Tai,
Abstract summary: Reconstructing 3D objects from a few unposed and partially occluded views is a common yet challenging problem in real-world scenarios.<n>We introduce AmodalGen3D, a generative framework for amodal 3D object reconstruction.<n>By jointly modeling visible and hidden regions, AmodalGen3D faithfully reconstructs 3D objects consistent with sparse-view constraints.
Score: 37.60004902691764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reconstructing 3D objects from a few unposed and partially occluded views is a common yet challenging problem in real-world scenarios, where many object surfaces are never directly observed. Traditional multi-view or inpainting-based approaches struggle under such conditions, often yielding incomplete or geometrically inconsistent reconstructions. We introduce AmodalGen3D, a generative framework for amodal 3D object reconstruction that infers complete, occlusion-free geometry and appearance from arbitrary sparse inputs. The model integrates 2D amodal completion priors with multi-view stereo geometry conditioning, supported by a View-Wise Cross Attention mechanism for sparse-view feature fusion and a Stereo-Conditioned Cross Attention module for unobserved structure inference. By jointly modeling visible and hidden regions, AmodalGen3D faithfully reconstructs 3D objects that are consistent with sparse-view constraints while plausibly hallucinating unseen parts. Experiments on both synthetic and real-world datasets demonstrate that AmodalGen3D achieves superior fidelity and completeness under occlusion-heavy sparse-view settings, addressing a pressing need for object-level 3D scene reconstruction in robotics, AR/VR, and embodied AI applications.

Related papers

RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z)
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [28.308731720451053]
We propose ReconViaGen to integrate reconstruction priors into the generative framework.<n>Our experiments demonstrate that our ReconViaGen can reconstruct complete and accurate 3D models consistent with input views in both global structure and local details.
arXiv Detail & Related papers (2025-10-27T13:15:06Z)
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z)
SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning [46.441761732998536]
We introduce Scene-Consistent Object Refinement via Proxy Generation and Tuning (SCORP)<n>SCORP is a novel 3D enhancement framework that leverages 3D generative priors to recover fine-grained object geometry and appearance under missing views.<n>It achieves consistent gains over recent state-of-the-art baselines on both novel view synthesis and geometry completion tasks.
arXiv Detail & Related papers (2025-06-30T13:26:21Z)
DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion [50.90541069907167]
We propose DeOcc-1-to-3, an end-to-end framework for occlusion-aware multi-view generation.<n>Our self-supervised training pipeline leverages occluded-unoccluded image pairs and pseudo-ground-truth views to teach the model structure-aware completion and view consistency.
arXiv Detail & Related papers (2025-06-26T17:58:26Z)
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images [66.77399370856462]
Amodal3R is a conditional 3D generative model designed to reconstruct 3D objects from partial observations.<n>It learns to recover full 3D objects even in the presence of occlusions in real scenes.<n>It substantially outperforms existing methods that independently perform 2D amodal completion followed by 3D reconstruction.
arXiv Detail & Related papers (2025-03-17T17:59:01Z)
Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image. Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z)
In-Hand 3D Object Reconstruction from a Monocular RGB Video [17.31419675163019]
Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object.
arXiv Detail & Related papers (2023-12-27T06:19:25Z)
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision [75.38953287579616]
We present a novel framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction. We tackle two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion. Experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin.
arXiv Detail & Related papers (2023-10-18T03:57:06Z)
Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model. Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.