On-the-Fly SfM: What you capture is What you get
- URL: http://arxiv.org/abs/2309.11883v2
- Date: Wed, 14 Feb 2024 02:21:19 GMT
- Title: On-the-Fly SfM: What you capture is What you get
- Authors: Zongqian Zhan, Rui Xia, Yifei Yu, Yibo Xu, Xin Wang
- Abstract summary: We present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points.
Specifically, our approach employs a vocabulary tree that is unsupervised trained using learning-based global features.
A robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance.
- Score: 26.08032193296505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last decades, ample achievements have been made on Structure from
motion (SfM). However, the vast majority of them basically work in an offline
manner, i.e., images are firstly captured and then fed together into a SfM
pipeline for obtaining poses and sparse point cloud. In this work, on the
contrary, we present an on-the-fly SfM: running online SfM while image
capturing, the newly taken On-the-Fly image is online estimated with the
corresponding pose and points, i.e., what you capture is what you get.
Specifically, our approach firstly employs a vocabulary tree that is
unsupervised trained using learning-based global features for fast image
retrieval of newly fly-in image. Then, a robust feature matching mechanism with
least squares (LSM) is presented to improve image registration performance.
Finally, via investigating the influence of newly fly-in image's connected
neighboring images, an efficient hierarchical weighted local bundle adjustment
(BA) is used for optimization. Extensive experimental results demonstrate that
on-the-fly SfM can meet the goal of robustly registering the images while
capturing in an online way.
Related papers
- SfM on-the-fly: Get better 3D from What You Capture [24.141351494527303]
Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc.
This work builds upon the original on-the-fly SfM and presents an updated version with three new advancements to get better 3D from what you capture.
arXiv Detail & Related papers (2024-07-04T13:52:37Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Linear Alignment of Vision-language Models for Image Captioning [9.746397419479447]
We propose a more efficient training protocol that fits a linear mapping between image and text embeddings of CLIP.
This bypasses the need for gradient computation and results in a lightweight captioning method called ReCap.
We evaluate ReCap on MS-COCO, Flickr30k, VizWiz, and MSRVTT.
arXiv Detail & Related papers (2023-07-10T17:59:21Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - EC-SfM: Efficient Covisibility-based Structure-from-Motion for Both
Sequential and Unordered Images [24.6736600856999]
This paper presents an efficient covisibility-based incremental SfM for unordered Internet images.
We propose a unified framework to efficiently reconstruct sequential images, unordered images, and the mixture of these two.
The proposed method is three times faster than the state of the art on feature matching, and an order of magnitude faster on reconstruction without sacrificing the accuracy.
arXiv Detail & Related papers (2023-02-21T09:18:57Z) - SfM-TTR: Using Structure from Motion for Test-Time Refinement of
Single-View Depth Networks [13.249453757295086]
We propose a novel test-time refinement (TTR) method, denoted as SfM-TTR, to boost the performance of single-view depth networks at test time.
Specifically, and differently from the state of the art, we use sparse SfM point clouds as test-time self-supervisory signal.
Our results show how the addition of SfM-TTR to several state-of-the-art self-supervised and supervised networks improves significantly their performance.
arXiv Detail & Related papers (2022-11-24T12:02:13Z) - Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid
Learning [48.890709236564945]
A small ISO and a small exposure time are usually used to capture an image in the back or low light conditions.
In this paper, a single image brightening algorithm is introduced to brighten such an image.
The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times.
arXiv Detail & Related papers (2020-07-04T08:23:07Z) - DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions.
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations.
To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.