SAM 3D for 3D Object Reconstruction from Remote Sensing Images
- URL: http://arxiv.org/abs/2512.22452v1
- Date: Sat, 27 Dec 2025 03:47:39 GMT
- Title: SAM 3D for 3D Object Reconstruction from Remote Sensing Images
- Authors: Junsheng Yao, Lichao Mou, Qingyu Li,
- Abstract summary: This paper presents the first systematic evaluation of SAM 3D, a general-purpose image-to-3D foundation model.<n> Experimental results demonstrate that SAM 3D produces more coherent roof geometry and sharper boundaries compared to TRELLIS.
- Score: 3.893451853752809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D building reconstruction from remote sensing imagery is essential for scalable urban modeling, yet existing methods often require task-specific architectures and intensive supervision. This paper presents the first systematic evaluation of SAM 3D, a general-purpose image-to-3D foundation model, for monocular remote sensing building reconstruction. We benchmark SAM 3D against TRELLIS on samples from the NYC Urban Dataset, employing Frechet Inception Distance (FID) and CLIP-based Maximum Mean Discrepancy (CMMD) as evaluation metrics. Experimental results demonstrate that SAM 3D produces more coherent roof geometry and sharper boundaries compared to TRELLIS. We further extend SAM 3D to urban scene reconstruction through a segment-reconstruct-compose pipeline, demonstrating its potential for urban scene modeling. We also analyze practical limitations and discuss future research directions. These findings provide practical guidance for deploying foundation models in urban 3D reconstruction and motivate future integration of scene-level structural priors.
Related papers
- S-MUSt3R: Sliding Multi-view 3D Reconstruction [17.018626984951823]
This work proposes S-MUSt3R, a simple and efficient pipeline that extends the limits of foundation models for monocular 3D reconstruction.<n>We show that S-MUSt3R runs successfully on long RGB sequences and produces accurate and consistent 3D reconstruction.
arXiv Detail & Related papers (2026-02-04T13:07:14Z) - AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance [36.125573065910594]
Active 3D reconstruction enables an agent to autonomously select viewpoints to obtain accurate and complete scene geometry.<n>We propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance.<n>Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization.
arXiv Detail & Related papers (2025-11-28T06:17:02Z) - AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend [18.645700170943975]
AMB3R is a feed-forward model for dense 3D reconstruction on a metric-scale.<n>We show that AMB3R can be seamlessly extended to uncalibrated visual odometry (online) or large-scale structure from motion.
arXiv Detail & Related papers (2025-11-25T14:23:04Z) - Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction [45.27825308128629]
Ref-SAM3D is a simple yet effective extension to SAM3D that incorporates textual descriptions as a high-level prior.<n>We show that Ref-SAM3D, guided only by natural language and a single 2D view, delivers competitive and high-fidelity zero-shot reconstruction performance.
arXiv Detail & Related papers (2025-11-24T18:58:22Z) - SAM 3D: 3Dfy Anything in Images [99.1053358868456]
We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image.<n>We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose.<n>We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.
arXiv Detail & Related papers (2025-11-20T18:31:46Z) - MapAnything: Universal Feed-Forward Metric 3D Reconstruction [63.79151976126576]
MapAnything ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions.<n>It then directly regresses the metric 3D scene geometry and cameras.<n>MapAnything addresses a broad range of 3D vision tasks in a single feed-forward pass.
arXiv Detail & Related papers (2025-09-16T18:00:14Z) - From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation [3.0477617036157136]
High-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying.<n>While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection.<n>Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding.<n>We present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation.
arXiv Detail & Related papers (2025-05-23T02:35:46Z) - SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model [59.04877271899894]
This paper explores adapting the zero-shot ability of SAM to 3D object detection in this paper.
We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale open dataset.
arXiv Detail & Related papers (2023-06-04T03:09:21Z) - gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object
Reconstruction [94.46581592405066]
We exploit the hand structure and use it as guidance for SDF-based shape reconstruction.
We predict kinematic chains of pose transformations and align SDFs with highly-articulated hand poses.
arXiv Detail & Related papers (2023-04-24T10:05:48Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - Elevation Estimation-Driven Building 3D Reconstruction from Single-View
Remote Sensing Imagery [20.001807614214922]
Building 3D reconstruction from remote sensing images has a wide range of applications in smart cities, photogrammetry and other fields.
We propose an efficient DSM estimation-driven reconstruction framework (Building3D) to reconstruct 3D building models from the input single-view remote sensing image.
Our Building3D is rooted in the SFFDE network for building elevation prediction, synchronized with a building extraction network for building masks, and then sequentially performs point cloud reconstruction, surface reconstruction (or CityGML model reconstruction)
arXiv Detail & Related papers (2023-01-11T17:20:30Z) - Learning Reconstructability for Drone Aerial Path Planning [51.736344549907265]
We introduce the first learning-based reconstructability predictor to improve view and path planning for large-scale 3D urban scene acquisition using unmanned drones.
In contrast to previous approaches, our method learns a model that explicitly predicts how well a 3D urban scene will be reconstructed from a set of viewpoints.
arXiv Detail & Related papers (2022-09-21T08:10:26Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.