Modality-invariant Visual Odometry for Embodied Vision
- URL: http://arxiv.org/abs/2305.00348v1
- Date: Sat, 29 Apr 2023 21:47:12 GMT
- Title: Modality-invariant Visual Odometry for Embodied Vision
- Authors: Marius Memmel, Roman Bachmann, Amir Zamir
- Abstract summary: Visual Odometry (VO) is a practical substitute for unreliable GPS and compass sensors.
Recent deep VO models limit themselves to a fixed set of input modalities, e.g., RGB and depth, while training on millions of samples.
We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents.
- Score: 1.7188280334580197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effectively localizing an agent in a realistic, noisy setting is crucial for
many embodied vision tasks. Visual Odometry (VO) is a practical substitute for
unreliable GPS and compass sensors, especially in indoor environments. While
SLAM-based methods show a solid performance without large data requirements,
they are less flexible and robust w.r.t. to noise and changes in the sensor
suite compared to learning-based approaches. Recent deep VO models, however,
limit themselves to a fixed set of input modalities, e.g., RGB and depth, while
training on millions of samples. When sensors fail, sensor suites change, or
modalities are intentionally looped out due to available resources, e.g., power
consumption, the models fail catastrophically. Furthermore, training these
models from scratch is even more expensive without simulator access or suitable
existing models that can be fine-tuned. While such scenarios get mostly ignored
in simulation, they commonly hinder a model's reusability in real-world
applications. We propose a Transformer-based modality-invariant VO approach
that can deal with diverse or changing sensor suites of navigation agents. Our
model outperforms previous methods while training on only a fraction of the
data. We hope this method opens the door to a broader range of real-world
applications that can benefit from flexible and learned VO models.
Related papers
- MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation [3.9974562667271507]
Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments.
Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training.
We propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment.
arXiv Detail & Related papers (2024-11-07T15:36:49Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry [9.79428015716139]
In this paper, we analyze major failure cases on outdoor benchmarks and expose shortcomings of a learning-based SLAM model (DROID-SLAM)
We propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process.
Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.
arXiv Detail & Related papers (2024-06-03T01:59:29Z) - Model-aware reinforcement learning for high-performance Bayesian
experimental design in quantum metrology [0.5461938536945721]
Quantum sensors offer control flexibility during estimation by allowing manipulation by the experimenter across various parameters.
We introduce a versatile procedure capable of optimizing a wide range of problems in quantum metrology, estimation, and hypothesis testing.
We combine model-aware reinforcement learning (RL) with Bayesian estimation based on particle filtering.
arXiv Detail & Related papers (2023-12-28T12:04:15Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - EMA-VIO: Deep Visual-Inertial Odometry with External Memory Attention [5.144653418944836]
Visual-inertial odometry (VIO) algorithms exploit the information from camera and inertial sensors to estimate position and translation.
Recent deep learning based VIO models attract attentions as they provide pose information in a data-driven way.
We propose a novel learning based VIO framework with external memory attention that effectively and efficiently combines visual and inertial features for states estimation.
arXiv Detail & Related papers (2022-09-18T07:05:36Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Can Deep Learning be Applied to Model-Based Multi-Object Tracking? [25.464269324261636]
Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements.
Deep learning (DL) has been increasingly used in MOT for improving tracking performance.
In this paper, we propose a Transformer-based DL tracker and evaluate its performance in the model-based setting.
arXiv Detail & Related papers (2022-02-16T07:43:08Z) - VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and
Policy Learning for Autonomous Vehicles [131.2240621036954]
We present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles.
Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras.
We demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle.
arXiv Detail & Related papers (2021-11-23T18:58:10Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.