AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry
Estimation
- URL: http://arxiv.org/abs/2206.12946v1
- Date: Sun, 26 Jun 2022 19:29:08 GMT
- Title: AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry
Estimation
- Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden
- Abstract summary: We propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors.
Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources.
Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions.
- Score: 39.351088248776435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion estimation approaches typically employ sensor fusion techniques, such
as the Kalman Filter, to handle individual sensor failures. More recently, deep
learning-based fusion approaches have been proposed, increasing the performance
and requiring less model-specific implementations. However, current deep fusion
approaches often assume that sensors are synchronised, which is not always
practical, especially for low-cost hardware. To address this limitation, in
this work, we propose AFT-VO, a novel transformer-based sensor fusion
architecture to estimate VO from multiple sensors. Our framework combines
predictions from asynchronous multi-view cameras and accounts for the time
discrepancies of measurements coming from different sources.
Our approach first employs a Mixture Density Network (MDN) to estimate the
probability distributions of the 6-DoF poses for every camera in the system.
Then a novel transformer-based fusion module, AFT-VO, is introduced, which
combines these asynchronous pose estimations, along with their confidences.
More specifically, we introduce Discretiser and Source Encoding techniques
which enable the fusion of multi-source asynchronous signals.
We evaluate our approach on the popular nuScenes and KITTI datasets. Our
experiments demonstrate that multi-view fusion for VO estimation provides
robust and accurate trajectories, outperforming the state of the art in both
challenging weather and lighting conditions.
Related papers
- Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.
Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities.
We set the new state of the art with CAFuser on the MUSES dataset with 59.7 PQ for multimodal panoptic segmentation and 78.2 mIoU for semantic segmentation, ranking first on the public benchmarks.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - Virtual Fusion with Contrastive Learning for Single Sensor-based
Activity Recognition [5.225544155289783]
Various types of sensors can be used for Human Activity Recognition (HAR)
Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions.
We propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference.
arXiv Detail & Related papers (2023-12-01T17:03:27Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - Continuous-Time vs. Discrete-Time Vision-based SLAM: A Comparative Study [46.89180519082908]
This work systematically compares the advantages and limitations of the two formulations in vision-based SLAM.
We develop, and open source, a modular and efficient software architecture containing state-of-the-art algorithms to solve the SLAM problem in discrete and continuous time.
arXiv Detail & Related papers (2022-02-17T20:42:06Z) - Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation [34.8860186009308]
We propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras.
We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes.
arXiv Detail & Related papers (2021-12-23T19:44:45Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - MIMC-VINS: A Versatile and Resilient Multi-IMU Multi-Camera
Visual-Inertial Navigation System [44.76768683036822]
We propose a real-time consistent multi-IMU multi-camera (CMU)-VINS estimator for visual-inertial navigation systems.
Within an efficient multi-state constraint filter, the proposed MIMC-VINS algorithm optimally fuses asynchronous measurements from all sensors.
The proposed MIMC-VINS is validated in both Monte-Carlo simulations and real-world experiments.
arXiv Detail & Related papers (2020-06-28T20:16:08Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.