Related papers: DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping

DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping

URL: http://arxiv.org/abs/2403.13714v1
Date: Wed, 20 Mar 2024 16:20:54 GMT
Title: DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping
Authors: Yuxuan Zhou, Xingxing Li, Shengyu Li, Xuanbin Wang, Shaoquan Feng, Yuxuan Tan,
Abstract summary: We tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments.
Score: 3.5047603107971397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual simultaneous localization and mapping (VSLAM) has broad applications, with state-of-the-art methods leveraging deep neural networks for better robustness and applicability. However, there is a lack of research in fusing these learning-based methods with multi-sensor information, which could be indispensable to push related applications to large-scale and complex scenarios. In this paper, we tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. In the framework, recurrent optical flow and DBA are performed among sequential images. The Hessian information derived from DBA is fed into a generic factor graph for multi-sensor fusion, which employs a sliding window and supports probabilistic marginalization. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. Furthermore, other sensors (e.g., global navigation satellite system) are integrated for driftless and geo-referencing functionality. Extensive tests are conducted on both public datasets and self-collected datasets. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments. The code has been made open-source (https://github.com/GREAT-WHU/DBA-Fusion).

Related papers

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction [8.723840755505817]
We propose a novel multimodal occupancy prediction network called SDG-OCC.<n>It incorporates a joint semantic and depth-guided view transformation and a fusion-to-occupancy-driven active distillation.<n>Our method achieves state-of-the-art (SOTA) performance with real-time processing on the Occ3D-nuScenes dataset.
arXiv Detail & Related papers (2025-07-22T23:49:40Z)
Optimizing Cooperative Multi-Object Tracking using Graph Signal Processing [45.68287260385148]
This paper proposes a novel Cooperative MOT framework for tracking objects in 3D LiDAR scene.<n>By exploiting a fully connected graph topology defined by the detected bounding boxes, we employ the Graph Laplacian processing optimization technique.<n>An extensive evaluation study has been conducted, using the real-world V2V4Real dataset.
arXiv Detail & Related papers (2025-06-11T07:21:58Z)
BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model [8.720833232645155]
Bird's-Eye-View (BEV) image is one of the most widely adopted data representations in autonomous driving. We propose BEVDiffLoc, a novel framework that formulates LiDAR localization as a conditional generation of poses.
arXiv Detail & Related papers (2025-03-14T13:17:43Z)
Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection [10.353412441955436]
We propose the GL-DMNet, a novel dual mutual learning network with global-local awareness. We present a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities. Our proposed GL-DMNet performs better than 24 RGB-D SOD methods, achieving an average improvement of 3%.
arXiv Detail & Related papers (2025-01-03T05:37:54Z)
SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames [3.5047603107971397]
SF-Loc is a lightweight visual mapping and map-aided localization system. In the mapping phase, multi-sensor dense bundle adjustment (MS-DBA) is applied to construct geo-referenced visual structure frames. In the localization phase, coarse-to-fine vision-based localization is performed, in which multi-frame information and the map distribution are fully integrated.
arXiv Detail & Related papers (2024-12-02T13:51:58Z)
OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances [11.085165252259042]
OSMLoc is a brain-inspired single-image visual localization method with semantic and geometric guidance to improve accuracy, robustness, and generalization ability. To validate the proposed OSMLoc, we collect a worldwide cross-area and cross-condition (CC) benchmark for extensive evaluation.
arXiv Detail & Related papers (2024-11-13T14:59:00Z)
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet. In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships. Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z)
RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks [11.681342476516267]
We propose a Remote Distributed Sensing Foundation Model (RS-DFM) based on generalized information mapping and interaction. This model can realize online collaborative perception across multiple platforms and various downstream tasks. We present a dual-branch information compression module to decouple high-frequency and low-frequency feature information.
arXiv Detail & Related papers (2024-06-11T07:46:47Z)
Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs [49.71319907864573]
In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector. Considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method.
arXiv Detail & Related papers (2023-07-21T14:53:12Z)
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z)
RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
AutoPose: Searching Multi-Scale Branch Aggregation for Pose Estimation [96.29533512606078]
We present AutoPose, a novel neural architecture search(NAS) framework. It is capable of automatically discovering multiple parallel branches of cross-scale connections towards accurate and high-resolution 2D human pose estimation.
arXiv Detail & Related papers (2020-08-16T22:27:43Z)
Robust Image Retrieval-based Visual Localization using Kapture [10.249293519246478]
We present a versatile pipeline for visual localization that facilitates the use of different local and global features. We evaluate our methods on eight public datasets where they rank top on all and first on many of them. To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license.
arXiv Detail & Related papers (2020-07-27T21:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.