Displacement-Invariant Cost Computation for Efficient Stereo Matching
- URL: http://arxiv.org/abs/2012.00899v1
- Date: Tue, 1 Dec 2020 23:58:16 GMT
- Title: Displacement-Invariant Cost Computation for Efficient Stereo Matching
- Authors: Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai,
Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
- Abstract summary: Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
- Score: 122.94051630000934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning-based methods have dominated stereo matching
leaderboards by yielding unprecedented disparity accuracy, their inference time
is typically slow, on the order of seconds for a pair of 540p images. The main
reason is that the leading methods employ time-consuming 3D convolutions
applied to a 4D feature volume. A common way to speed up the computation is to
downsample the feature volume, but this loses high-frequency details. To
overcome these challenges, we propose a \emph{displacement-invariant cost
computation module} to compute the matching costs without needing a 4D feature
volume. Rather, costs are computed by applying the same 2D convolution network
on each disparity-shifted feature map pair independently. Unlike previous 2D
convolution-based methods that simply perform context mapping between inputs
and disparity maps, our proposed approach learns to match features between the
two images. We also propose an entropy-based refinement strategy to refine the
computed disparity map, which further improves speed by avoiding the need to
compute a second disparity map on the right image. Extensive experiments on
standard datasets (SceneFlow, KITTI, ETH3D, and Middlebury) demonstrate that
our method achieves competitive accuracy with much less inference time. On
typical image sizes, our method processes over 100 FPS on a desktop GPU, making
our method suitable for time-critical applications such as autonomous driving.
We also show that our approach generalizes well to unseen datasets,
outperforming 4D-volumetric methods.
Related papers
- Occupancy-Based Dual Contouring [12.944046673902415]
We introduce a dual contouring method that provides state-of-the-art performance for occupancy functions.
Our method is learning-free and carefully designed to maximize the use of GPU parallelization.
arXiv Detail & Related papers (2024-09-20T11:32:21Z) - Image-Coupled Volume Propagation for Stereo Matching [0.24366811507669117]
We propose a new way to process the 4D cost volume where we merge two different concepts in one framework to achieve a symbiotic relationship.
A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs.
Our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method.
arXiv Detail & Related papers (2022-12-30T13:23:25Z) - Differentiable Point-Based Radiance Fields for Efficient View Synthesis [57.56579501055479]
We propose a differentiable rendering algorithm for efficient novel view synthesis.
Our method is up to 300x faster than NeRF in both training and inference.
For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate.
arXiv Detail & Related papers (2022-05-28T04:36:13Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z) - Displacement-Invariant Matching Cost Learning for Accurate Optical Flow
Estimation [109.64756528516631]
Learning matching costs have been shown to be critical to the success of the state-of-the-art deep stereo matching methods.
This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume.
Our approach achieves state-of-the-art accuracy on various datasets, and outperforms all published optical flow methods on the Sintel benchmark.
arXiv Detail & Related papers (2020-10-28T09:57:00Z) - Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images.
We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z) - Real-time Dense Reconstruction of Tissue Surface from Stereo Optical
Video [10.181846237133167]
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time.
The basic idea is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models.
Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm.
arXiv Detail & Related papers (2020-07-16T19:14:05Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple
Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views.
Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene.
The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.