Modality-invariant Visual Odometry for Embodied Vision
- URL: http://arxiv.org/abs/2305.00348v1
- Date: Sat, 29 Apr 2023 21:47:12 GMT
- Title: Modality-invariant Visual Odometry for Embodied Vision
- Authors: Marius Memmel, Roman Bachmann, Amir Zamir
- Abstract summary: Visual Odometry (VO) is a practical substitute for unreliable GPS and compass sensors.
Recent deep VO models limit themselves to a fixed set of input modalities, e.g., RGB and depth, while training on millions of samples.
We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents.
- Score: 1.7188280334580197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effectively localizing an agent in a realistic, noisy setting is crucial for
many embodied vision tasks. Visual Odometry (VO) is a practical substitute for
unreliable GPS and compass sensors, especially in indoor environments. While
SLAM-based methods show a solid performance without large data requirements,
they are less flexible and robust w.r.t. to noise and changes in the sensor
suite compared to learning-based approaches. Recent deep VO models, however,
limit themselves to a fixed set of input modalities, e.g., RGB and depth, while
training on millions of samples. When sensors fail, sensor suites change, or
modalities are intentionally looped out due to available resources, e.g., power
consumption, the models fail catastrophically. Furthermore, training these
models from scratch is even more expensive without simulator access or suitable
existing models that can be fine-tuned. While such scenarios get mostly ignored
in simulation, they commonly hinder a model's reusability in real-world
applications. We propose a Transformer-based modality-invariant VO approach
that can deal with diverse or changing sensor suites of navigation agents. Our
model outperforms previous methods while training on only a fraction of the
data. We hope this method opens the door to a broader range of real-world
applications that can benefit from flexible and learned VO models.
Related papers
- Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry [9.79428015716139]
In this paper, we analyze major failure cases on outdoor benchmarks and expose shortcomings of a learning-based SLAM model (DROID-SLAM)
We propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process.
Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.
arXiv Detail & Related papers (2024-06-03T01:59:29Z) - Model-aware reinforcement learning for high-performance Bayesian
experimental design in quantum metrology [0.5461938536945721]
Quantum sensors offer control flexibility during estimation by allowing manipulation by the experimenter across various parameters.
We introduce a versatile procedure capable of optimizing a wide range of problems in quantum metrology, estimation, and hypothesis testing.
We combine model-aware reinforcement learning (RL) with Bayesian estimation based on particle filtering.
arXiv Detail & Related papers (2023-12-28T12:04:15Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Inference from Real-World Sparse Measurements [21.194357028394226]
Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time.
Deep learning architectures capable of processing sets of measurements with positions varying from set to set and extracting readouts anywhere are methodologically difficult.
We propose an attention-based model focused on applicability and practical robustness, with two key design contributions.
arXiv Detail & Related papers (2022-10-20T13:42:20Z) - EMA-VIO: Deep Visual-Inertial Odometry with External Memory Attention [5.144653418944836]
Visual-inertial odometry (VIO) algorithms exploit the information from camera and inertial sensors to estimate position and translation.
Recent deep learning based VIO models attract attentions as they provide pose information in a data-driven way.
We propose a novel learning based VIO framework with external memory attention that effectively and efficiently combines visual and inertial features for states estimation.
arXiv Detail & Related papers (2022-09-18T07:05:36Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Can Deep Learning be Applied to Model-Based Multi-Object Tracking? [25.464269324261636]
Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements.
Deep learning (DL) has been increasingly used in MOT for improving tracking performance.
In this paper, we propose a Transformer-based DL tracker and evaluate its performance in the model-based setting.
arXiv Detail & Related papers (2022-02-16T07:43:08Z) - VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and
Policy Learning for Autonomous Vehicles [131.2240621036954]
We present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles.
Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras.
We demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle.
arXiv Detail & Related papers (2021-11-23T18:58:10Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.