Related papers: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

URL: http://arxiv.org/abs/2512.05131v1
Date: Fri, 28 Nov 2025 06:17:02 GMT
Title: AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
Authors: Tianling Xu, Shengzhe Gan, Leslie Gu, Yuelei Li, Fangneng Zhan, Hanspeter Pfister,
Abstract summary: Active 3D reconstruction enables an agent to autonomously select viewpoints to obtain accurate and complete scene geometry.<n>We propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance.<n>Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization.
Score: 36.125573065910594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-collected images. However, existing active reconstruction methods often rely on hand-crafted geometric heuristics, which can lead to redundant observations without substantially improving reconstruction quality. To address this limitation, we propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance. Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization. In addition, an integrated vision-language model provides high-level semantic guidance, encouraging informative and diverse viewpoints beyond purely geometric cues. Extensive experiments on both scene-level and object-level benchmarks demonstrate that AREA3D achieves state-of-the-art reconstruction accuracy, particularly in the sparse-view regime. Code will be made available at: https://github.com/TianlingXu/AREA3D .

Related papers

AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend [18.645700170943975]
AMB3R is a feed-forward model for dense 3D reconstruction on a metric-scale.<n>We show that AMB3R can be seamlessly extended to uncalibrated visual odometry (online) or large-scale structure from motion.
arXiv Detail & Related papers (2025-11-25T14:23:04Z)
LARM: A Large Articulated-Object Reconstruction Model [29.66486888001511]
LARM is a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images.<n>LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation.<n>Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories.
arXiv Detail & Related papers (2025-11-14T18:55:27Z)
EA3D: Online Open-World 3D Object Extraction from Streaming Videos [55.48835711373918]
We present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction.<n>Given a streaming video, EA3D dynamically interprets each frame using vision-language and 2D vision foundation encoders to extract object-level knowledge.<n>A recurrent joint optimization module directs the model's attention to regions of interest, simultaneously enhancing both geometric reconstruction and semantic understanding.
arXiv Detail & Related papers (2025-10-29T03:56:41Z)
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [28.308731720451053]
We propose ReconViaGen to integrate reconstruction priors into the generative framework.<n>Our experiments demonstrate that our ReconViaGen can reconstruct complete and accurate 3D models consistent with input views in both global structure and local details.
arXiv Detail & Related papers (2025-10-27T13:15:06Z)
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity [78.7107376451476]
Hi3DEval is a hierarchical evaluation framework tailored for 3D generative content.<n>We extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism.<n>We propose a 3D-aware automated scoring system based on hybrid 3D representations.
arXiv Detail & Related papers (2025-08-07T17:50:13Z)
GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity [49.31257173003408]
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video.<n>Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
arXiv Detail & Related papers (2025-05-17T08:46:29Z)
Regist3R: Incremental Registration with Stereo Foundation Model [22.636140424781455]
Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision.<n>We propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction.<n>We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction.
arXiv Detail & Related papers (2025-04-16T02:46:53Z)
PE3R: Perception-Efficient 3D Reconstruction [54.730257992806116]
Perception-Efficient 3D Reconstruction (PE3R) is a novel framework designed to enhance both accuracy and efficiency.<n>The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision.
arXiv Detail & Related papers (2025-03-10T16:29:10Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z)
FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction [13.157400338544177]
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry is feasible using deep neural networks. We propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
arXiv Detail & Related papers (2023-04-04T02:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.