Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation
- URL: http://arxiv.org/abs/2112.12818v1
- Date: Thu, 23 Dec 2021 19:44:45 GMT
- Title: Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation
- Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden
- Abstract summary: We propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras.
We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes.
- Score: 34.8860186009308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Odometry (VO) estimation is an important source of information for
vehicle state estimation and autonomous driving. Recently, deep learning based
approaches have begun to appear in the literature. However, in the context of
driving, single sensor based approaches are often prone to failure because of
degraded image quality due to environmental factors, camera placement, etc. To
address this issue, we propose a deep sensor fusion framework which estimates
vehicle motion using both pose and uncertainty estimations from multiple
on-board cameras. We extract spatio-temporal feature representations from a set
of consecutive images using a hybrid CNN - RNN model. We then utilise a Mixture
Density Network (MDN) to estimate the 6-DoF pose as a mixture of distributions
and a fusion module to estimate the final pose using MDN outputs from
multi-cameras. We evaluate our approach on the publicly available, large scale
autonomous vehicle dataset, nuScenes. The results show that the proposed fusion
approach surpasses the state-of-the-art, and provides robust estimates and
accurate trajectories compared to individual camera-based estimations.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Adaptive Fusion of Single-View and Multi-View Depth for Autonomous
Driving [22.58849429006898]
Current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings.
We propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results.
Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing.
arXiv Detail & Related papers (2024-03-12T11:18:35Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry
Estimation [39.351088248776435]
We propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors.
Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources.
Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions.
arXiv Detail & Related papers (2022-06-26T19:29:08Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - A Quality Index Metric and Method for Online Self-Assessment of
Autonomous Vehicles Sensory Perception [164.93739293097605]
We propose a novel evaluation metric, named as the detection quality index (DQI), which assesses the performance of camera-based object detection algorithms.
We have developed a superpixel-based attention network (SPA-NET) that utilizes raw image pixels and superpixels as input to predict the proposed DQI evaluation metric.
arXiv Detail & Related papers (2022-03-04T22:16:50Z) - MDN-VO: Estimating Visual Odometry with Confidence [34.8860186009308]
Visual Odometry (VO) is used in many applications including robotics and autonomous systems.
We propose a deep learning-based VO model to estimate 6-DoF poses, as well as a confidence model for these estimates.
Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases.
arXiv Detail & Related papers (2021-12-23T19:26:04Z) - View Invariant Human Body Detection and Pose Estimation from Multiple
Depth Sensors [0.7080990243618376]
We propose an end-to-end multi-person 3D pose estimation network, Point R-CNN, using multiple point cloud sources.
We conduct extensive experiments to simulate challenging real world cases, such as individual camera failures, various target appearances, and complex cluttered scenes.
In the meantime, we show our end-to-end network greatly outperforms cascaded state-of-the-art models.
arXiv Detail & Related papers (2020-05-08T19:06:28Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.