Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose
Regression and Odometry-aided Absolute Pose Regression
- URL: http://arxiv.org/abs/2208.00919v3
- Date: Fri, 4 Aug 2023 08:36:02 GMT
- Title: Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose
Regression and Odometry-aided Absolute Pose Regression
- Authors: Felix Ott and Nisha Lakshmana Raichur and David R\"ugamer and Tobias
Feigl and Heiko Neumann and Bernd Bischl and Christopher Mutschler
- Abstract summary: Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles.
In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks.
We show improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and handheld devices.
- Score: 6.557612703872671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual-inertial localization is a key problem in computer vision and robotics
applications such as virtual reality, self-driving cars, and aerial vehicles.
The goal is to estimate an accurate pose of an object when either the
environment or the dynamics are known. Absolute pose regression (APR)
techniques directly regress the absolute pose from an image input in a known
scene using convolutional and spatio-temporal networks. Odometry methods
perform relative pose regression (RPR) that predicts the relative pose from a
known object dynamic (visual or inertial inputs). The localization task can be
improved by retrieving information from both data sources for a cross-modal
setup, which is a challenging problem due to contradictory tasks. In this work,
we conduct a benchmark to evaluate deep multimodal fusion based on pose graph
optimization and attention networks. Auxiliary and Bayesian learning are
utilized for the APR task. We show accuracy improvements for the APR-RPR task
and for the RPR-RPR task for aerial vehicles and hand-held devices. We conduct
experiments on the EuRoC MAV and PennCOSYVIO datasets and record and evaluate a
novel industry dataset.
Related papers
- Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - RoMeO: Robust Metric Visual Odometry [11.381243799745729]
Visual odometry (VO) aims to estimate camera poses from visual inputs -- a fundamental building block for many applications such as VR/AR and robotics.
Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data (especially outdoors)
We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues leveraging priors from pre-trained depth models.
arXiv Detail & Related papers (2024-12-16T08:08:35Z) - Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.
Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.
We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z) - KS-APR: Keyframe Selection for Robust Absolute Pose Regression [2.541264438930729]
Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects.
End-to-end machine learning solutions infer the device's pose from a single monocular image.
APR methods tend to yield significant inaccuracies for input images that are too distant from the training set.
This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead.
arXiv Detail & Related papers (2023-08-10T09:32:20Z) - Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments [13.654208446015824]
The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses.
Recent advances in deep learning have enabled the localization using monocular visual cameras.
This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods.
arXiv Detail & Related papers (2023-04-14T16:58:23Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Denoising IMU Gyroscopes with Deep Learning for Open-Loop Attitude
Estimation [0.0]
This paper proposes a learning method for denoising gyroscopes of Inertial Measurement Units (IMUs) using ground truth data.
The obtained algorithm outperforms the state-of-the-art on the (unseen) test sequences.
arXiv Detail & Related papers (2020-02-25T08:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.