Related papers: Distributed Global Structure-from-Motion with a Deep Front-End

Distributed Global Structure-from-Motion with a Deep Front-End

URL: http://arxiv.org/abs/2311.18801v1
Date: Thu, 30 Nov 2023 18:47:18 GMT
Title: Distributed Global Structure-from-Motion with a Deep Front-End
Authors: Ayush Baid, John Lambert, Travis Driver, Akshay Krishnan, Hayk Stepanyan, and Frank Dellaert
Abstract summary: We investigate whether leveraging the developments in feature extraction and matching helps global SfM perform on par with the SOTA incremental SfM approach (COLMAP) Our SfM system is designed from the ground up to leverage distributed computation, enabling us to parallelize computation on multiple machines and scale to large scenes.
Score: 11.2064188838227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While initial approaches to Structure-from-Motion (SfM) revolved around both global and incremental methods, most recent applications rely on incremental systems to estimate camera poses due to their superior robustness. Though there has been tremendous progress in SfM `front-ends' powered by deep models learned from data, the state-of-the-art (incremental) SfM pipelines still rely on classical SIFT features, developed in 2004. In this work, we investigate whether leveraging the developments in feature extraction and matching helps global SfM perform on par with the SOTA incremental SfM approach (COLMAP). To do so, we design a modular SfM framework that allows us to easily combine developments in different stages of the SfM pipeline. Our experiments show that while developments in deep-learning based two-view correspondence estimation do translate to improvements in point density for scenes reconstructed with global SfM, none of them outperform SIFT when comparing with incremental SfM results on a range of datasets. Our SfM system is designed from the ground up to leverage distributed computation, enabling us to parallelize computation on multiple machines and scale to large scenes.

Related papers

FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z)
Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm [17.63632082331749]
Large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, but their potential for single-frame infrared small target (SIRST) detection remains largely unexplored.<n>We propose a Foundation-Driven Efficient Paradigm (FDEP) which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead.
arXiv Detail & Related papers (2025-12-05T08:12:35Z)
InstantSfM: Fully Sparse and Parallel Structure-from-Motion [18.540622250926624]
Structure-from-Motion (SfM) is a method that recovers camera poses and scene geometry from uncalibrated images.<n> GLOMAP, naive CPU-specialized implementations of bundle adjustment (BA) or global positioning (GP) introduce significant computational overhead when handling large-scale scenarios.<n>In this paper, we unleash the full potential of GPU parallel computation to accelerate each critical stage of the standard SfM pipeline.
arXiv Detail & Related papers (2025-10-15T08:58:05Z)
MEGA: xLSTM with Multihead Exponential Gated Fusion for Precise Aspect-based Sentiment Analysis [2.9045498954705886]
Aspect-based Sentiment Analysis (ABSA) is a critical Natural Language Processing (NLP) task that extracts aspects from text and determines their associated sentiments.<n>Existing ABSA methods struggle to balance computational efficiency with high performance.<n>We propose xLSTM with Multihead Exponential Gated Fusion (MEGA), a novel framework integrating a bi-directional mLSTM architecture with forward and partially flipped backward streams.
arXiv Detail & Related papers (2025-07-01T22:21:33Z)
Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z)
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models [11.670056503731905]
We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method. Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation.
arXiv Detail & Related papers (2025-02-27T09:17:49Z)
Dense-SfM: Structure from Motion with Dense Consistent Matching [10.24418219366936]
We present Dense-SfM, a novel framework for dense and accurate 3D reconstruction from multi-view images. Dense-SfM integrates dense matching with a Gaussian Splatting (GS) based track extension which gives more consistent, longer feature tracks. Dense-SfM offers significant improvements in accuracy and density over state-of-the-art methods.
arXiv Detail & Related papers (2025-01-24T06:45:12Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Robust Incremental Structure-from-Motion with Hybrid Features [73.55745864762703]
We introduce an incremental Structure-from-Motion (SfM) system that leverages lines and their structured geometric relations. Our system is consistently more robust and accurate compared to the widely used point-based state of the art in SfM.
arXiv Detail & Related papers (2024-09-29T22:20:32Z)
Global Structure-from-Motion Revisited [57.30100303979393]
We propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM. We share our system as an open-source implementation.
arXiv Detail & Related papers (2024-07-29T17:54:24Z)
Towards Scale-Aware Full Surround Monodepth with Transformers [46.100897032607335]
Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously to predict the scale-aware depth. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation.
arXiv Detail & Related papers (2024-07-15T02:54:46Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from Motion [48.835456049755166]
AdaSfM is a coarse-to-fine adaptive SfM approach that is scalable to large-scale and challenging datasets. Our approach first does a coarse global SfM which improves the reliability of the view graph by leveraging measurements from low-cost sensors. Our approach uses a threshold-adaptive strategy to align all local reconstructions to the coordinate frame of global SfM.
arXiv Detail & Related papers (2023-01-28T09:06:50Z)
DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM) We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE. Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z)
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z)
DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting [50.17500790309477]
DeMFI-Net is a joint deblurring and multi-frame framework. It converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate. It achieves state-of-the-art (SOTA) performances for diverse datasets.
arXiv Detail & Related papers (2021-11-19T00:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.