Related papers: Displacement-Invariant Cost Computation for Efficient Stereo Matching

Displacement-Invariant Cost Computation for Efficient Stereo Matching

URL: http://arxiv.org/abs/2012.00899v1
Date: Tue, 1 Dec 2020 23:58:16 GMT
Title: Displacement-Invariant Cost Computation for Efficient Stereo Matching
Authors: Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
Abstract summary: Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy. But their inference time is typically slow, on the order of seconds for a pair of 540p images. We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
Score: 122.94051630000934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep learning-based methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy, their inference time is typically slow, on the order of seconds for a pair of 540p images. The main reason is that the leading methods employ time-consuming 3D convolutions applied to a 4D feature volume. A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details. To overcome these challenges, we propose a \emph{displacement-invariant cost computation module} to compute the matching costs without needing a 4D feature volume. Rather, costs are computed by applying the same 2D convolution network on each disparity-shifted feature map pair independently. Unlike previous 2D convolution-based methods that simply perform context mapping between inputs and disparity maps, our proposed approach learns to match features between the two images. We also propose an entropy-based refinement strategy to refine the computed disparity map, which further improves speed by avoiding the need to compute a second disparity map on the right image. Extensive experiments on standard datasets (SceneFlow, KITTI, ETH3D, and Middlebury) demonstrate that our method achieves competitive accuracy with much less inference time. On typical image sizes, our method processes over 100 FPS on a desktop GPU, making our method suitable for time-critical applications such as autonomous driving. We also show that our approach generalizes well to unseen datasets, outperforming 4D-volumetric methods.

Related papers

Exploring Diffusion with Test-Time Training on Efficient Image Restoration [1.3830502387127932]
DiffRWKVIR is a novel framework unifying Test-Time Training (TTT) with efficient diffusion.<n>Our method establishes a new paradigm for adaptive, high-efficiency image restoration with optimized hardware utilization.
arXiv Detail & Related papers (2025-06-17T14:01:59Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme. Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy. This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
Occupancy-Based Dual Contouring [12.944046673902415]
We introduce a dual contouring method that provides state-of-the-art performance for occupancy functions. Our method is learning-free and carefully designed to maximize the use of GPU parallelization.
arXiv Detail & Related papers (2024-09-20T11:32:21Z)
Image-Coupled Volume Propagation for Stereo Matching [0.24366811507669117]
We propose a new way to process the 4D cost volume where we merge two different concepts in one framework to achieve a symbiotic relationship. A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs. Our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method.
arXiv Detail & Related papers (2022-12-30T13:23:25Z)
Differentiable Point-Based Radiance Fields for Efficient View Synthesis [57.56579501055479]
We propose a differentiable rendering algorithm for efficient novel view synthesis. Our method is up to 300x faster than NeRF in both training and inference. For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate.
arXiv Detail & Related papers (2022-05-28T04:36:13Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)
Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation [109.64756528516631]
Learning matching costs have been shown to be critical to the success of the state-of-the-art deep stereo matching methods. This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume. Our approach achieves state-of-the-art accuracy on various datasets, and outperforms all published optical flow methods on the Sintel benchmark.
arXiv Detail & Related papers (2020-10-28T09:57:00Z)
Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images. We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z)
Real-time Dense Reconstruction of Tissue Surface from Stereo Optical Video [10.181846237133167]
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time. The basic idea is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models. Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm.
arXiv Detail & Related papers (2020-07-16T19:14:05Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views. Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene. The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.