DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual
Odometry
- URL: http://arxiv.org/abs/2105.09899v1
- Date: Thu, 20 May 2021 17:05:31 GMT
- Title: DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual
Odometry
- Authors: Ran Zhu, Mingkun Yang, Wang Liu, Rujun Song, Bo Yan, Zhuoling Xiao
- Abstract summary: This paper studies monocular Visual Odometry (VO) from the perspective of Deep Learning (DL)
We present a novel four-branch network to learn the rotation and translation by leveraging Conal Neural Networks (CNNs) to focus on different quadrants of optical flow input.
Experiments on various datasets involving outdoor driving and indoor walking scenarios show that the proposed DeepAVO outperforms the state-of-the-art monocular methods by a large margin.
- Score: 8.114855695727003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The technology for Visual Odometry (VO) that estimates the position and
orientation of the moving object through analyzing the image sequences captured
by on-board cameras, has been well investigated with the rising interest in
autonomous driving. This paper studies monocular VO from the perspective of
Deep Learning (DL). Unlike most current learning-based methods, our approach,
called DeepAVO, is established on the intuition that features contribute
discriminately to different motion patterns. Specifically, we present a novel
four-branch network to learn the rotation and translation by leveraging
Convolutional Neural Networks (CNNs) to focus on different quadrants of optical
flow input. To enhance the ability of feature selection, we further introduce
an effective channel-spatial attention mechanism to force each branch to
explicitly distill related information for specific Frame to Frame (F2F) motion
estimation. Experiments on various datasets involving outdoor driving and
indoor walking scenarios show that the proposed DeepAVO outperforms the
state-of-the-art monocular methods by a large margin, demonstrating competitive
performance to the stereo VO algorithm and verifying promising potential for
generalization.
Related papers
- LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Lightweight Monocular Depth Estimation with an Edge Guided Network [34.03711454383413]
We present a novel lightweight Edge Guided Depth Estimation Network (EGD-Net)
In particular, we start out with a lightweight encoder-decoder architecture and embed an edge guidance branch.
In order to aggregate the context information and edge attention features, we design a transformer-based feature aggregation module.
arXiv Detail & Related papers (2022-09-29T14:45:47Z) - USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks [31.600708674008384]
USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
arXiv Detail & Related papers (2022-07-15T13:25:47Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - On Deep Learning Techniques to Boost Monocular Depth Estimation for
Autonomous Navigation [1.9007546108571112]
Inferring the depth of images is a fundamental inverse problem within the field of Computer Vision.
We propose a new lightweight and fast supervised CNN architecture combined with novel feature extraction models.
We also introduce an efficient surface normals module, jointly with a simple geometric 2.5D loss function, to solve SIDE problems.
arXiv Detail & Related papers (2020-10-13T18:37:38Z) - Self-Supervised Joint Learning Framework of Depth Estimation via
Implicit Cues [24.743099160992937]
We propose a novel self-supervised joint learning framework for depth estimation.
The proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D datasets.
arXiv Detail & Related papers (2020-06-17T13:56:59Z) - End-to-end Learning for Inter-Vehicle Distance and Relative Velocity
Estimation in ADAS with a Monocular Camera [81.66569124029313]
We propose a camera-based inter-vehicle distance and relative velocity estimation method based on end-to-end training of a deep neural network.
The key novelty of our method is the integration of multiple visual clues provided by any two time-consecutive monocular frames.
We also propose a vehicle-centric sampling mechanism to alleviate the effect of perspective distortion in the motion field.
arXiv Detail & Related papers (2020-06-07T08:18:31Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.