AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
- URL: http://arxiv.org/abs/2512.05131v1
- Date: Fri, 28 Nov 2025 06:17:02 GMT
- Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
- Authors: Tianling Xu, Shengzhe Gan, Leslie Gu, Yuelei Li, Fangneng Zhan, Hanspeter Pfister,
- Abstract summary: Active 3D reconstruction enables an agent to autonomously select viewpoints to obtain accurate and complete scene geometry.<n>We propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance.<n>Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization.
- Score: 36.125573065910594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-collected images. However, existing active reconstruction methods often rely on hand-crafted geometric heuristics, which can lead to redundant observations without substantially improving reconstruction quality. To address this limitation, we propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance. Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization. In addition, an integrated vision-language model provides high-level semantic guidance, encouraging informative and diverse viewpoints beyond purely geometric cues. Extensive experiments on both scene-level and object-level benchmarks demonstrate that AREA3D achieves state-of-the-art reconstruction accuracy, particularly in the sparse-view regime. Code will be made available at: https://github.com/TianlingXu/AREA3D .
Related papers
- AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend [18.645700170943975]
AMB3R is a feed-forward model for dense 3D reconstruction on a metric-scale.<n>We show that AMB3R can be seamlessly extended to uncalibrated visual odometry (online) or large-scale structure from motion.
arXiv Detail & Related papers (2025-11-25T14:23:04Z) - LARM: A Large Articulated-Object Reconstruction Model [29.66486888001511]
LARM is a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images.<n>LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation.<n>Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories.
arXiv Detail & Related papers (2025-11-14T18:55:27Z) - EA3D: Online Open-World 3D Object Extraction from Streaming Videos [55.48835711373918]
We present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction.<n>Given a streaming video, EA3D dynamically interprets each frame using vision-language and 2D vision foundation encoders to extract object-level knowledge.<n>A recurrent joint optimization module directs the model's attention to regions of interest, simultaneously enhancing both geometric reconstruction and semantic understanding.
arXiv Detail & Related papers (2025-10-29T03:56:41Z) - ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [28.308731720451053]
We propose ReconViaGen to integrate reconstruction priors into the generative framework.<n>Our experiments demonstrate that our ReconViaGen can reconstruct complete and accurate 3D models consistent with input views in both global structure and local details.
arXiv Detail & Related papers (2025-10-27T13:15:06Z) - Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity [78.7107376451476]
Hi3DEval is a hierarchical evaluation framework tailored for 3D generative content.<n>We extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism.<n>We propose a 3D-aware automated scoring system based on hybrid 3D representations.
arXiv Detail & Related papers (2025-08-07T17:50:13Z) - GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity [49.31257173003408]
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video.<n>Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
arXiv Detail & Related papers (2025-05-17T08:46:29Z) - Regist3R: Incremental Registration with Stereo Foundation Model [22.636140424781455]
Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision.<n>We propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction.<n>We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction.
arXiv Detail & Related papers (2025-04-16T02:46:53Z) - PE3R: Perception-Efficient 3D Reconstruction [54.730257992806116]
Perception-Efficient 3D Reconstruction (PE3R) is a novel framework designed to enhance both accuracy and efficiency.<n>The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision.
arXiv Detail & Related papers (2025-03-10T16:29:10Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - FineRecon: Depth-aware Feed-forward Network for Detailed 3D
Reconstruction [13.157400338544177]
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry is feasible using deep neural networks.
We propose three effective solutions for improving the fidelity of inference-based 3D reconstructions.
Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
arXiv Detail & Related papers (2023-04-04T02:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.