Regist3R: Incremental Registration with Stereo Foundation Model
- URL: http://arxiv.org/abs/2504.12356v1
- Date: Wed, 16 Apr 2025 02:46:53 GMT
- Title: Regist3R: Incremental Registration with Stereo Foundation Model
- Authors: Sidun Liu, Wenyu Li, Peng Qiao, Yong Dou,
- Abstract summary: Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision.<n>We propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction.<n>We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction.
- Score: 11.220655907305515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision. While DUSt3R and its successors have achieved breakthroughs in 3D reconstruction from unposed images, these methods exhibit significant limitations when scaling to multi-view scenarios, including high computational cost and cumulative error induced by global alignment. To address these challenges, we propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction. Regist3R leverages an incremental reconstruction paradigm, enabling large-scale 3D reconstructions from unordered and many-view image collections. We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction. Our experiments demonstrate that Regist3R achieves comparable performance with optimization-based methods while significantly improving computational efficiency, and outperforms existing multi-view reconstruction models. Furthermore, to assess its performance in real-world applications, we introduce a challenging oblique aerial dataset which has long spatial spans and hundreds of views. The results highlight the effectiveness of Regist3R. We also demonstrate the first attempt to reconstruct large-scale scenes encompassing over thousands of views through pointmap-based foundation models, showcasing its potential for practical applications in large-scale 3D reconstruction tasks, including urban modeling, aerial mapping, and beyond.
Related papers
- Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction [11.220655907305515]
We introduce a monocular-guided refinement module that integrates monocular geometric priors into multi-view reconstruction frameworks.
Our method achieves substantial improvements in both mutli-view camera pose estimation and point cloud accuracy.
arXiv Detail & Related papers (2025-04-18T02:33:12Z) - MUSt3R: Multi-view Network for Stereo 3D Reconstruction [11.61182864709518]
We propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns.<n>We entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity.<n>The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios.
arXiv Detail & Related papers (2025-03-03T15:36:07Z) - Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass [68.78222900840132]
We propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel.<n>Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation.
arXiv Detail & Related papers (2025-01-23T18:59:55Z) - LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images [23.96972213606037]
We reformulate 3D reconstruction as image-to-image translation and introduce the Relative Coordinate Map (RCM)<n>RCM aligns multiple unposed images to a main view without pose estimation.<n>While RCM simplifies the process, its lack of global 3D supervision can yield noisy outputs.<n>Our LucidFusion framework handles an arbitrary number of unposed inputs, producing robust 3D reconstructions within seconds and paving the way for more flexible, pose-free 3D pipelines.
arXiv Detail & Related papers (2024-10-21T04:47:01Z) - UW-SDF: Exploiting Hybrid Geometric Priors for Neural SDF Reconstruction from Underwater Multi-view Monocular Images [63.32490897641344]
We propose a framework for reconstructing target objects from multi-view underwater images based on neural SDF.
We introduce hybrid geometric priors to optimize the reconstruction process, markedly enhancing the quality and efficiency of neural SDF reconstruction.
arXiv Detail & Related papers (2024-10-10T16:33:56Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.<n>With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.<n>Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement [51.97726804507328]
We propose a novel approach for 3D mesh reconstruction from multi-view images.
Our method takes inspiration from large reconstruction models that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.
arXiv Detail & Related papers (2024-06-09T05:19:24Z) - Reconstructing Satellites in 3D from Amateur Telescope Images [44.20773507571372]
We propose a novel computational imaging framework that overcomes obstacles by integrating a hybrid image pre-processing pipeline.<n>We validate our approach on both synthetic satellite datasets and on-sky observations of China's Tiangong Space Station and the International Space Station.<n>Our framework enables high-fidelity 3D satellite monitoring from Earth, offering a cost-effective alternative for space situational awareness.
arXiv Detail & Related papers (2024-04-29T03:13:09Z) - ReconFusion: 3D Reconstruction with Diffusion Priors [104.73604630145847]
We present ReconFusion to reconstruct real-world scenes using only a few photos.
Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets.
Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions.
arXiv Detail & Related papers (2023-12-05T18:59:58Z) - Elevation Estimation-Driven Building 3D Reconstruction from Single-View
Remote Sensing Imagery [20.001807614214922]
Building 3D reconstruction from remote sensing images has a wide range of applications in smart cities, photogrammetry and other fields.
We propose an efficient DSM estimation-driven reconstruction framework (Building3D) to reconstruct 3D building models from the input single-view remote sensing image.
Our Building3D is rooted in the SFFDE network for building elevation prediction, synchronized with a building extraction network for building masks, and then sequentially performs point cloud reconstruction, surface reconstruction (or CityGML model reconstruction)
arXiv Detail & Related papers (2023-01-11T17:20:30Z) - Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections.
We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.