SfM on-the-fly: Get better 3D from What You Capture
- URL: http://arxiv.org/abs/2407.03939v3
- Date: Mon, 15 Jul 2024 01:56:32 GMT
- Title: SfM on-the-fly: Get better 3D from What You Capture
- Authors: Zongqian Zhan, Yifei Yu, Rui Xia, Wentian Gan, Hong Xie, Giulio Perda, Luca Morelli, Fabio Remondino, Xin Wang,
- Abstract summary: Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc.
This work builds upon the original on-the-fly SfM and presents an updated version with three new advancements to get better 3D from what you capture.
- Score: 24.141351494527303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last twenty years, Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc., whereas real-time performance is just a recent topic of growing interest. This work builds upon the original on-the-fly SfM (Zhan et al., 2024) and presents an updated version with three new advancements to get better 3D from what you capture: (i) real-time image matching is further boosted by employing the Hierarchical Navigable Small World (HNSW) graphs, thus more true positive overlapping image candidates are faster identified; (ii) a self-adaptive weighting strategy is proposed for robust hierarchical local bundle adjustment to improve the SfM results; (iii) multiple agents are included for supporting collaborative SfM and seamlessly merge multiple 3D reconstructions into a complete 3D scene when commonly registered images appear. Various comprehensive experiments demonstrate that the proposed SfM method (named on-the-fly SfMv2) can generate more complete and robust 3D reconstructions in a high time-efficient way. Code is available at http://yifeiyu225.github.io/on-the-flySfMv2.github.io/.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion [88.02512124661884]
We propose Magic-Boost, a multi-view conditioned diffusion model that significantly refines coarse generative results.
Compared to the previous text or single image based diffusion models, Magic-Boost exhibits a robust capability to generate images with high consistency.
It provides precise SDS guidance that well aligns with the identity of the input images, enriching the local detail in both geometry and texture of the initial generative results.
arXiv Detail & Related papers (2024-04-09T16:20:03Z) - The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - On-the-Fly SfM: What you capture is What you get [26.08032193296505]
We present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points.
Specifically, our approach employs a vocabulary tree that is unsupervised trained using learning-based global features.
A robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance.
arXiv Detail & Related papers (2023-09-21T08:34:01Z) - EC-SfM: Efficient Covisibility-based Structure-from-Motion for Both
Sequential and Unordered Images [24.6736600856999]
This paper presents an efficient covisibility-based incremental SfM for unordered Internet images.
We propose a unified framework to efficiently reconstruct sequential images, unordered images, and the mixture of these two.
The proposed method is three times faster than the state of the art on feature matching, and an order of magnitude faster on reconstruction without sacrificing the accuracy.
arXiv Detail & Related papers (2023-02-21T09:18:57Z) - Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from
Sparse Image Ensemble [72.3681707384754]
Hi-LASSIE performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates.
First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image.
Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance.
arXiv Detail & Related papers (2022-12-21T14:31:33Z) - FOF: Learning Fourier Occupancy Field for Monocular Real-time Human
Reconstruction [73.85709132666626]
Existing representations, such as parametric models, voxel grids, meshes and implicit neural representations, have difficulties achieving high-quality results and real-time speed at the same time.
We propose Fourier Occupancy Field (FOF), a novel powerful, efficient and flexible 3D representation, for monocular real-time and accurate human reconstruction.
A FOF can be stored as a multi-channel image, which is compatible with 2D convolutional neural networks and can bridge the gap between 3D and 2D images.
arXiv Detail & Related papers (2022-06-05T14:45:02Z) - MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image [18.68544438724187]
We propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence.
Our framework, called MobRecon, comprises affordable computational costs and miniature model size, which reaches a high inference speed of 83FPS on Apple A14 CPU.
arXiv Detail & Related papers (2021-12-06T03:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.