SfM on-the-fly: Get better 3D from What You Capture
- URL: http://arxiv.org/abs/2407.03939v3
- Date: Mon, 15 Jul 2024 01:56:32 GMT
- Title: SfM on-the-fly: Get better 3D from What You Capture
- Authors: Zongqian Zhan, Yifei Yu, Rui Xia, Wentian Gan, Hong Xie, Giulio Perda, Luca Morelli, Fabio Remondino, Xin Wang,
- Abstract summary: Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc.
This work builds upon the original on-the-fly SfM and presents an updated version with three new advancements to get better 3D from what you capture.
- Score: 24.141351494527303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last twenty years, Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc., whereas real-time performance is just a recent topic of growing interest. This work builds upon the original on-the-fly SfM (Zhan et al., 2024) and presents an updated version with three new advancements to get better 3D from what you capture: (i) real-time image matching is further boosted by employing the Hierarchical Navigable Small World (HNSW) graphs, thus more true positive overlapping image candidates are faster identified; (ii) a self-adaptive weighting strategy is proposed for robust hierarchical local bundle adjustment to improve the SfM results; (iii) multiple agents are included for supporting collaborative SfM and seamlessly merge multiple 3D reconstructions into a complete 3D scene when commonly registered images appear. Various comprehensive experiments demonstrate that the proposed SfM method (named on-the-fly SfMv2) can generate more complete and robust 3D reconstructions in a high time-efficient way. Code is available at http://yifeiyu225.github.io/on-the-flySfMv2.github.io/.
Related papers
- Light3R-SfM: Towards Feed-forward Structure-from-Motion [34.47706116389972]
Light3R-SfM is a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion.
This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.
arXiv Detail & Related papers (2025-01-24T20:46:04Z) - Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention [54.66152436050373]
We propose a Multi-view Large Reconstruction Model (M-LRM) to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner.
Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images.
Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity.
arXiv Detail & Related papers (2024-06-11T18:29:13Z) - Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion [101.15628083270224]
We propose a novel multi-view conditioned diffusion model to synthesize high-fidelity novel view images.
We then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results.
Experiments show Magic-Boost greatly enhances the coarse generated inputs, generates high-quality 3D assets with rich geometric and textural details.
arXiv Detail & Related papers (2024-04-09T16:20:03Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images.
InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned by progressively expanding the scene.
It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - EC-SfM: Efficient Covisibility-based Structure-from-Motion for Both
Sequential and Unordered Images [24.6736600856999]
This paper presents an efficient covisibility-based incremental SfM for unordered Internet images.
We propose a unified framework to efficiently reconstruct sequential images, unordered images, and the mixture of these two.
The proposed method is three times faster than the state of the art on feature matching, and an order of magnitude faster on reconstruction without sacrificing the accuracy.
arXiv Detail & Related papers (2023-02-21T09:18:57Z) - Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from
Sparse Image Ensemble [72.3681707384754]
Hi-LASSIE performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates.
First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image.
Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance.
arXiv Detail & Related papers (2022-12-21T14:31:33Z) - MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image [18.68544438724187]
We propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence.
Our framework, called MobRecon, comprises affordable computational costs and miniature model size, which reaches a high inference speed of 83FPS on Apple A14 CPU.
arXiv Detail & Related papers (2021-12-06T03:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.