Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
- URL: http://arxiv.org/abs/2501.13928v1
- Date: Thu, 23 Jan 2025 18:59:55 GMT
- Title: Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
- Authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli,
- Abstract summary: We propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel.<n>Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation.
- Score: 68.78222900840132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation. These results establish Fast3R as a robust alternative for multi-view applications, offering enhanced scalability without compromising reconstruction accuracy.
Related papers
- Regist3R: Incremental Registration with Stereo Foundation Model [11.220655907305515]
Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision.
We propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction.
We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction.
arXiv Detail & Related papers (2025-04-16T02:46:53Z) - PE3R: Perception-Efficient 3D Reconstruction [54.730257992806116]
Perception-Efficient 3D Reconstruction (PE3R) is a novel framework designed to enhance both accuracy and efficiency.
The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision.
arXiv Detail & Related papers (2025-03-10T16:29:10Z) - MUSt3R: Multi-view Network for Stereo 3D Reconstruction [11.61182864709518]
We propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns.
We entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity.
The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios.
arXiv Detail & Related papers (2025-03-03T15:36:07Z) - VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment [63.21396416244634]
VideoLifter is a novel video-to-3D pipeline that leverages a local-to-global strategy on a fragment basis.
It significantly accelerates the reconstruction process, reducing training time by over 82% while holding better visual quality than current SOTA methods.
arXiv Detail & Related papers (2025-01-03T18:52:36Z) - Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention [54.66152436050373]
We propose a Multi-view Large Reconstruction Model (M-LRM) to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner.<n>Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images.<n>Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity.
arXiv Detail & Related papers (2024-06-11T18:29:13Z) - GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement [51.97726804507328]
We propose a novel approach for 3D mesh reconstruction from multi-view images.
Our method takes inspiration from large reconstruction models that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.
arXiv Detail & Related papers (2024-06-09T05:19:24Z) - DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.<n>We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.<n>Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - Implicit Shape and Appearance Priors for Few-Shot Full Head
Reconstruction [17.254539604491303]
In this paper, we address the problem of few-shot full 3D head reconstruction.
We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations.
We extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks.
arXiv Detail & Related papers (2023-10-12T07:35:30Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - FIRe: Fast Inverse Rendering using Directional and Signed Distance
Functions [97.5540646069663]
We introduce a novel neural scene representation that we call the directional distance function (DDF)
Our DDF is defined on the unit sphere and predicts the distance to the surface along any given direction.
Based on our DDF, we present a novel fast algorithm (FIRe) to reconstruct 3D shapes given a posed depth map.
arXiv Detail & Related papers (2022-03-30T13:24:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.