Deep Patch Visual Odometry
- URL: http://arxiv.org/abs/2208.04726v2
- Date: Tue, 23 May 2023 17:59:29 GMT
- Title: Deep Patch Visual Odometry
- Authors: Zachary Teed, Lahav Lipson and Jia Deng
- Abstract summary: Deep Patch Visual Odometry (DPVO) is a new deep learning system for monocular Visual Odometry (VO)
DPVO uses a novel recurrent network architecture designed for tracking image patches across time.
On Standard benchmarks, DPVO outperforms all prior work, including the learning-based state-of-the-art VO-system.
- Score: 66.8086971254714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Deep Patch Visual Odometry (DPVO), a new deep learning system for
monocular Visual Odometry (VO). DPVO uses a novel recurrent network
architecture designed for tracking image patches across time. Recent approaches
to VO have significantly improved the state-of-the-art accuracy by using deep
networks to predict dense flow between video frames. However, using dense flow
incurs a large computational cost, making these previous methods impractical
for many use cases. Despite this, it has been assumed that dense flow is
important as it provides additional redundancy against incorrect matches. DPVO
disproves this assumption, showing that it is possible to get the best accuracy
and efficiency by exploiting the advantages of sparse patch-based matching over
dense flow. DPVO introduces a novel recurrent update operator for patch based
correspondence coupled with differentiable bundle adjustment. On Standard
benchmarks, DPVO outperforms all prior work, including the learning-based
state-of-the-art VO-system (DROID) using a third of the memory while running 3x
faster on average. Code is available at https://github.com/princeton-vl/DPVO
Related papers
- Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD)
Existing methods often incur high computational costs, which contradicts the real-time demands of AD.
We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z) - PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference [11.112356346406365]
PaPr is a method for substantially pruning redundant patches with minimal accuracy loss using lightweight ConvNets.
It achieves significantly higher accuracy over state-of-the-art patch reduction methods with similar FLOP count reduction.
arXiv Detail & Related papers (2024-03-24T05:50:00Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - Design and Scaffolded Training of an Efficient DNN Operator for Computer
Vision on the Edge [3.3767251810292955]
FuSeConv is a drop-in replacement for depthwise separable convolutions.
FuSeConv factorizes convolution fully along their spatial and depth dimensions.
Neural Operator Scaffolding scaffolds the training of FuSeConv by distilling knowledge from depthwise separable convolutions.
arXiv Detail & Related papers (2021-08-25T19:22:25Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Exploring Data Augmentation for Multi-Modality 3D Object Detection [82.9988604088494]
It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud.
We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying.
Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
arXiv Detail & Related papers (2020-12-23T15:23:16Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Regularized Densely-connected Pyramid Network for Salient Instance
Segmentation [73.17802158095813]
We propose a new pipeline for end-to-end salient instance segmentation (SIS)
To better use the rich feature hierarchies in deep networks, we propose the regularized dense connections.
A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions.
arXiv Detail & Related papers (2020-08-28T00:13:30Z) - Deep Isometric Learning for Visual Recognition [67.94199891354157]
We show that deep vanilla ConvNets can be trained to achieve surprisingly good performance on standard image recognition benchmarks.
Our code is available at https://github.com/HaozhiQi/ISONet.
arXiv Detail & Related papers (2020-06-30T17:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.