Decoupling Bidirectional Geometric Representations of 4D cost volume with 2D convolution
- URL: http://arxiv.org/abs/2509.02415v1
- Date: Tue, 02 Sep 2025 15:21:49 GMT
- Title: Decoupling Bidirectional Geometric Representations of 4D cost volume with 2D convolution
- Authors: Xiaobao Wei, Changyong Shu, Zhaokun Yue, Chang Huang, Weiwei Liu, Shuai Yang, Lirong Yang, Peng Gao, Wenbin Zhang, Gaochao Zhu, Chengxiang Wang,
- Abstract summary: We present a deployment-friendly 4D cost aggregation network DBStereo.<n>It is based on pure 2D convolutions and achieves real-time performance and impressive accuracy simultaneously.
- Score: 40.103929972279126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-performance real-time stereo matching methods invariably rely on 3D regularization of the cost volume, which is unfriendly to mobile devices. And 2D regularization based methods struggle in ill-posed regions. In this paper, we present a deployment-friendly 4D cost aggregation network DBStereo, which is based on pure 2D convolutions. Specifically, we first provide a thorough analysis of the decoupling characteristics of 4D cost volume. And design a lightweight bidirectional geometry aggregation block to capture spatial and disparity representation respectively. Through decoupled learning, our approach achieves real-time performance and impressive accuracy simultaneously. Extensive experiments demonstrate that our proposed DBStereo outperforms all existing aggregation-based methods in both inference time and accuracy, even surpassing the iterative-based method IGEV-Stereo. Our study break the empirical design of using 3D convolutions for 4D cost volume and provides a simple yet strong baseline of the proposed decouple aggregation paradigm for further study. Code will be available at (\href{https://github.com/happydummy/DBStereo}{https://github.com/happydummy/DBStereo}) soon.
Related papers
- LightStereo: Channel Boost Is All You Need for Efficient 2D Cost Aggregation [27.00836175513738]
LightStereo is a cutting-edge stereo-matching network crafted to accelerate the matching process.<n>Our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume.<n>LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs and 17 ms of runtime.
arXiv Detail & Related papers (2024-06-28T11:11:24Z) - Fully Sparse Fusion for 3D Object Detection [69.32694845027927]
Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View feature maps.
Fully sparse architecture is gaining attention as they are highly efficient in long-range perception.
In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.
arXiv Detail & Related papers (2023-04-24T17:57:43Z) - Image-Coupled Volume Propagation for Stereo Matching [0.24366811507669117]
We propose a new way to process the 4D cost volume where we merge two different concepts in one framework to achieve a symbiotic relationship.
A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs.
Our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method.
arXiv Detail & Related papers (2022-12-30T13:23:25Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Displacement-Invariant Matching Cost Learning for Accurate Optical Flow
Estimation [109.64756528516631]
Learning matching costs have been shown to be critical to the success of the state-of-the-art deep stereo matching methods.
This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume.
Our approach achieves state-of-the-art accuracy on various datasets, and outperforms all published optical flow methods on the Sintel benchmark.
arXiv Detail & Related papers (2020-10-28T09:57:00Z) - Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability.
We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Content-Aware Inter-Scale Cost Aggregation for Stereo Matching [42.02981855948903]
Our method achieves reliable detail recovery when upsampling through the aggregation of information across different scales.
A novel decomposition strategy is proposed to efficiently construct the 3D filter weights and aggregate the 3D cost volume.
Experiment results on Scene Flow dataset, KITTI2015 and Middlebury demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-06-05T02:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.