Related papers: MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

URL: http://arxiv.org/abs/2409.19152v1
Date: Fri, 27 Sep 2024 21:29:58 GMT
Title: MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion
Authors: Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jerome Revaud,
Abstract summary: We build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches. We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system. Our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not.
Score: 12.602510002753815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too little motion, etc. Recent methods have attempted to revisit this paradigm, but we empirically show that they fall short of fixing these core issues. In this paper, we propose instead to build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches. We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system. We further show that such foundation models can serve as efficient image retrievers without any overhead, reducing the overall complexity from quadratic to linear. Overall, our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not. Extensive experiments on multiple benchmarks show that our method provides steady performance across diverse settings, especially outperforming existing methods in small- and medium-scale settings.

Related papers

Flash Sculptor: Modular 3D Worlds from Objects [73.63179709035595]
Flash Sculptor is a simple yet effective framework for compositional 3D scene/object reconstruction from a single image. For rotation, we introduce a coarse-to-fine scheme that brings the best of both worlds--efficiency and accuracy--while for translation, we develop an outlier-removal-based algorithm. Experiments demonstrate that Flash Sculptor achieves at least a 3 times speedup over existing compositional 3D methods.
arXiv Detail & Related papers (2025-04-08T16:20:51Z)
Light3R-SfM: Towards Feed-forward Structure-from-Motion [34.47706116389972]
Light3R-SfM is a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion. This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.
arXiv Detail & Related papers (2025-01-24T20:46:04Z)
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image. We first design a novel pipeline to estimate the underlying hand pose and object shape. With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z)
LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images [23.96972213606037]
We reformulate 3D reconstruction as image-to-image translation and introduce the Relative Coordinate Map (RCM) RCM aligns multiple unposed images to a main view without pose estimation. While RCM simplifies the process, its lack of global 3D supervision can yield noisy outputs. Our LucidFusion framework handles an arbitrary number of unposed inputs, producing robust 3D reconstructions within seconds and paving the way for more flexible, pose-free 3D pipelines.
arXiv Detail & Related papers (2024-10-21T04:47:01Z)
SMORE: Simultaneous Map and Object REconstruction [66.66729715211642]
We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR.<n>We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background.
arXiv Detail & Related papers (2024-06-19T23:53:31Z)
Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View [5.222115919729418]
Single-view 3D reconstruction is currently approached from two dominant perspectives. We propose a hybrid method following a divide-and-conquer strategy. We first process the scene holistically, extracting depth and semantic information. We then leverage a single-shot object-level method for the detailed reconstruction of individual components.
arXiv Detail & Related papers (2024-04-04T12:58:46Z)
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z)
Towards Non-Line-of-Sight Photography [48.491977359971855]
Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects. Active NLOS imaging systems rely on the capture of the time of flight of light through the scene. We propose a new problem formulation, called NLOS photography, to specifically address this deficiency.
arXiv Detail & Related papers (2021-09-16T08:07:13Z)
Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames. We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.