Efficient Deep Visual and Inertial Odometry with Adaptive Visual
Modality Selection
- URL: http://arxiv.org/abs/2205.06187v1
- Date: Thu, 12 May 2022 16:17:49 GMT
- Title: Efficient Deep Visual and Inertial Odometry with Adaptive Visual
Modality Selection
- Authors: Mingyu Yang, Yu Chen, Hun-Seok Kim
- Abstract summary: We propose an adaptive deep-learning based VIO method that reduces computational redundancy by opportunistically disabling the visual modality.
A Gumbel-Softmax trick is adopted to train the policy network to make the decision process differentiable for end-to-end system training.
Experiment results show that our method achieves a similar or even better performance than the full-modality baseline.
- Score: 12.754974372231647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, deep learning-based approaches for visual-inertial odometry
(VIO) have shown remarkable performance outperforming traditional geometric
methods. Yet, all existing methods use both the visual and inertial
measurements for every pose estimation incurring potential computational
redundancy. While visual data processing is much more expensive than that for
the inertial measurement unit (IMU), it may not always contribute to improving
the pose estimation accuracy. In this paper, we propose an adaptive
deep-learning based VIO method that reduces computational redundancy by
opportunistically disabling the visual modality. Specifically, we train a
policy network that learns to deactivate the visual feature extractor on the
fly based on the current motion state and IMU readings. A Gumbel-Softmax trick
is adopted to train the policy network to make the decision process
differentiable for end-to-end system training. The learned strategy is
interpretable, and it shows scenario-dependent decision patterns for adaptive
complexity reduction. Experiment results show that our method achieves a
similar or even better performance than the full-modality baseline with up to
78.8% computational complexity reduction for KITTI dataset evaluation. Our code
will be shared in https://github.com/mingyuyng/Visual-Selective-VIO
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective [57.05315507519704]
We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing.
Our measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%.
arXiv Detail & Related papers (2024-09-03T12:03:45Z) - Enhancing Digital Hologram Reconstruction Using Reverse-Attention Loss for Untrained Physics-Driven Deep Learning Models with Uncertain Distance [10.788482076164314]
We present a pioneering approach to addressing the Autofocusing challenge in untrained deep-learning methods.
Our method presents a significant reconstruction performance over rival methods.
For example, the difference is less than 1dB in PSNR and 0.002 in SSIM for the target sample.
arXiv Detail & Related papers (2024-01-11T01:30:46Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z) - Denoising IMU Gyroscopes with Deep Learning for Open-Loop Attitude
Estimation [0.0]
This paper proposes a learning method for denoising gyroscopes of Inertial Measurement Units (IMUs) using ground truth data.
The obtained algorithm outperforms the state-of-the-art on the (unseen) test sequences.
arXiv Detail & Related papers (2020-02-25T08:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.