Light3R-SfM: Towards Feed-forward Structure-from-Motion
- URL: http://arxiv.org/abs/2501.14914v1
- Date: Fri, 24 Jan 2025 20:46:04 GMT
- Title: Light3R-SfM: Towards Feed-forward Structure-from-Motion
- Authors: Sven Elflein, Qunjie Zhou, Sérgio Agostinho, Laura Leal-Taixé,
- Abstract summary: Light3R-SfM is a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion.
This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.
- Score: 34.47706116389972
- License:
- Abstract: We present Light3R-SfM, a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections. Unlike existing SfM solutions that rely on costly matching and global optimization to achieve accurate 3D reconstructions, Light3R-SfM addresses this limitation through a novel latent global alignment module. This module replaces traditional global optimization with a learnable attention mechanism, effectively capturing multi-view constraints across images for robust and precise camera pose estimation. Light3R-SfM constructs a sparse scene graph via retrieval-score-guided shortest path tree to dramatically reduce memory usage and computational overhead compared to the naive approach. Extensive experiments demonstrate that Light3R-SfM achieves competitive accuracy while significantly reducing runtime, making it ideal for 3D reconstruction tasks in real-world applications with a runtime constraint. This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.
Related papers
- PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [3.61512056914095]
We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length.
PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images.
arXiv Detail & Related papers (2024-11-25T19:16:29Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion [12.602510002753815]
We build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches.
We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system.
Our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not.
arXiv Detail & Related papers (2024-09-27T21:29:58Z) - Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS [52.3215552448623]
Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses are crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions.
Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS.
Most existing works rely on per-pixel image loss functions, such as L2 loss.
In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS.
arXiv Detail & Related papers (2024-08-16T13:11:22Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time [112.32349668385635]
GGRt is a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses.
As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at $ge$ 5 FPS and real-time rendering at $ge$ 100 FPS.
arXiv Detail & Related papers (2024-03-15T09:47:35Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Variable Radiance Field for Real-World Category-Specific Reconstruction from Single Image [25.44715538841181]
Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task.
We propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects.
VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from
Motion [48.835456049755166]
AdaSfM is a coarse-to-fine adaptive SfM approach that is scalable to large-scale and challenging datasets.
Our approach first does a coarse global SfM which improves the reliability of the view graph by leveraging measurements from low-cost sensors.
Our approach uses a threshold-adaptive strategy to align all local reconstructions to the coordinate frame of global SfM.
arXiv Detail & Related papers (2023-01-28T09:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.