Related papers: Deep Patch Visual Odometry

Deep Patch Visual Odometry

URL: http://arxiv.org/abs/2208.04726v2
Date: Tue, 23 May 2023 17:59:29 GMT
Title: Deep Patch Visual Odometry
Authors: Zachary Teed, Lahav Lipson and Jia Deng
Abstract summary: Deep Patch Visual Odometry (DPVO) is a new deep learning system for monocular Visual Odometry (VO) DPVO uses a novel recurrent network architecture designed for tracking image patches across time. On Standard benchmarks, DPVO outperforms all prior work, including the learning-based state-of-the-art VO-system.
Score: 66.8086971254714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose Deep Patch Visual Odometry (DPVO), a new deep learning system for monocular Visual Odometry (VO). DPVO uses a novel recurrent network architecture designed for tracking image patches across time. Recent approaches to VO have significantly improved the state-of-the-art accuracy by using deep networks to predict dense flow between video frames. However, using dense flow incurs a large computational cost, making these previous methods impractical for many use cases. Despite this, it has been assumed that dense flow is important as it provides additional redundancy against incorrect matches. DPVO disproves this assumption, showing that it is possible to get the best accuracy and efficiency by exploiting the advantages of sparse patch-based matching over dense flow. DPVO introduces a novel recurrent update operator for patch based correspondence coupled with differentiable bundle adjustment. On Standard benchmarks, DPVO outperforms all prior work, including the learning-based state-of-the-art VO-system (DROID) using a third of the memory while running 3x faster on average. Code is available at https://github.com/princeton-vl/DPVO

Related papers

Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video. Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one. We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z)
Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry [7.517597541959445]
We introduce S-Temporal Visual Odometry (STVO), a novel deep network architecture to enhance accuracy and consistency of multi-frame flow matching. Our STVO achieves state-the-art performance on ETH3D benchmark and 38.9% on KITTI Odometry benchmark over the previous best methods.
arXiv Detail & Related papers (2024-12-22T08:47:13Z)
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD) Existing methods often incur high computational costs, which contradicts the real-time demands of AD. We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z)
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference [11.112356346406365]
PaPr is a method for substantially pruning redundant patches with minimal accuracy loss using lightweight ConvNets. It achieves significantly higher accuracy over state-of-the-art patch reduction methods with similar FLOP count reduction.
arXiv Detail & Related papers (2024-03-24T05:50:00Z)
a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure. We propose a new deep convolution network architecture with three contributions. Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z)
Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the Edge [3.3767251810292955]
FuSeConv is a drop-in replacement for depthwise separable convolutions. FuSeConv factorizes convolution fully along their spatial and depth dimensions. Neural Operator Scaffolding scaffolds the training of FuSeConv by distilling knowledge from depthwise separable convolutions.
arXiv Detail & Related papers (2021-08-25T19:22:25Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Exploring Data Augmentation for Multi-Modality 3D Object Detection [82.9988604088494]
It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud. We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying. Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
arXiv Detail & Related papers (2020-12-23T15:23:16Z)
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system. We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction. We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z)
Regularized Densely-connected Pyramid Network for Salient Instance Segmentation [73.17802158095813]
We propose a new pipeline for end-to-end salient instance segmentation (SIS) To better use the rich feature hierarchies in deep networks, we propose the regularized dense connections. A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions.
arXiv Detail & Related papers (2020-08-28T00:13:30Z)
Deep Isometric Learning for Visual Recognition [67.94199891354157]
We show that deep vanilla ConvNets can be trained to achieve surprisingly good performance on standard image recognition benchmarks. Our code is available at https://github.com/HaozhiQi/ISONet.
arXiv Detail & Related papers (2020-06-30T17:53:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.