Displacement-Invariant Matching Cost Learning for Accurate Optical Flow
Estimation
- URL: http://arxiv.org/abs/2010.14851v1
- Date: Wed, 28 Oct 2020 09:57:00 GMT
- Title: Displacement-Invariant Matching Cost Learning for Accurate Optical Flow
Estimation
- Authors: Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong
Li
- Abstract summary: Learning matching costs have been shown to be critical to the success of the state-of-the-art deep stereo matching methods.
This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume.
Our approach achieves state-of-the-art accuracy on various datasets, and outperforms all published optical flow methods on the Sintel benchmark.
- Score: 109.64756528516631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning matching costs has been shown to be critical to the success of the
state-of-the-art deep stereo matching methods, in which 3D convolutions are
applied on a 4D feature volume to learn a 3D cost volume. However, this
mechanism has never been employed for the optical flow task. This is mainly due
to the significantly increased search dimension in the case of optical flow
computation, ie, a straightforward extension would require dense 4D
convolutions in order to process a 5D feature volume, which is computationally
prohibitive. This paper proposes a novel solution that is able to bypass the
requirement of building a 5D feature volume while still allowing the network to
learn suitable matching costs from data. Our key innovation is to decouple the
connection between 2D displacements and learn the matching costs at each 2D
displacement hypothesis independently, ie, displacement-invariant cost
learning. Specifically, we apply the same 2D convolution-based matching net
independently on each 2D displacement hypothesis to learn a 4D cost volume.
Moreover, we propose a displacement-aware projection layer to scale the learned
cost volume, which reconsiders the correlation between different displacement
candidates and mitigates the multi-modal problem in the learned cost volume.
The cost volume is then projected to optical flow estimation through a 2D
soft-argmin layer. Extensive experiments show that our approach achieves
state-of-the-art accuracy on various datasets, and outperforms all published
optical flow methods on the Sintel benchmark.
Related papers
- 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost
Volume [6.122542233250026]
We present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation.
Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
arXiv Detail & Related papers (2023-12-06T12:43:11Z) - OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured
Traffic Scenarios [0.0]
We propose OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features.
We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth.
arXiv Detail & Related papers (2023-07-20T15:06:44Z) - Image-Coupled Volume Propagation for Stereo Matching [0.24366811507669117]
We propose a new way to process the 4D cost volume where we merge two different concepts in one framework to achieve a symbiotic relationship.
A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs.
Our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method.
arXiv Detail & Related papers (2022-12-30T13:23:25Z) - High-Resolution Optical Flow from 1D Attention and Correlation [89.61824964952949]
We propose a new method for high-resolution optical flow estimation with significantly less computation.
We first perform a 1D attention operation in the vertical direction of the target image, and then a simple 1D correlation in the horizontal direction of the attended image.
Experiments on Sintel, KITTI and real-world 4K resolution images demonstrated the effectiveness and superiority of our proposed method.
arXiv Detail & Related papers (2021-04-28T17:56:34Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Content-Aware Inter-Scale Cost Aggregation for Stereo Matching [42.02981855948903]
Our method achieves reliable detail recovery when upsampling through the aggregation of information across different scales.
A novel decomposition strategy is proposed to efficiently construct the 3D filter weights and aggregate the 3D cost volume.
Experiment results on Scene Flow dataset, KITTI2015 and Middlebury demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-06-05T02:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.