Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks
- URL: http://arxiv.org/abs/2405.14520v1
- Date: Thu, 23 May 2024 13:02:30 GMT
- Title: Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks
- Authors: Xingguang Jiang, Xiaofeng Bian, Chenggang Guo,
- Abstract summary: Current methods for depth estimation based on stereo matching suffer from large number of parameters and slow running time.
We propose Ghost-Stereo, a novel end-to-end stereo matching network.
Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing methods suffer from large number of parameters and slow running time due to the sequential use of 3D convolutions. In this paper, we propose Ghost-Stereo, a novel end-to-end stereo matching network. The feature extraction part of the network uses the GhostNet to form a U-shaped structure. The core of Ghost-Stereo is a GhostNet feature-based cost volume enhancement (Ghost-CVE) module and a GhostNet-inspired lightweight cost volume aggregation (Ghost-CVA) module. For the Ghost-CVE part, cost volumes are constructed and fused by the GhostNet-based features to enhance the spatial context awareness. For the Ghost-CVA part, a lightweight 3D convolution bottleneck block based on the GhostNet is proposed to reduce the computational complexity in this module. By combining with the context and geometry fusion module, a classical hourglass-shaped cost volume aggregate structure is constructed. Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks, and shows a better generalization ability.
Related papers
- DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception.
Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z) - GhostNetV2: Enhance Cheap Operation with Long-Range Attention [59.65543143580889]
We propose a hardware-friendly attention mechanism (dubbed DFC attention) and then present a new GhostNetV2 architecture for mobile applications.
The proposed DFC attention is constructed based on fully-connected layers, which can not only execute fast on common hardware but also capture the dependence between long-range pixels.
We further revisit the bottleneck in previous GhostNet and propose to enhance expanded features produced by cheap operations with DFC attention.
arXiv Detail & Related papers (2022-11-23T12:16:59Z) - RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization [13.605461609002539]
Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) architecture design.
Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers.
This paper provides a new perspective to realize feature reuse implicitly and more efficiently instead of concatenation.
arXiv Detail & Related papers (2022-11-11T09:44:23Z) - HorNet: Efficient High-Order Spatial Interactions with Recursive Gated
Convolutions [109.33112814212129]
We show that input-adaptive, long-range and high-order spatial interactions can be efficiently implemented with a convolution-based framework.
We present the Recursive Gated Convolution ($textitgtextitn$Conv) that performs high-order spatial interactions with gated convolutions.
Based on the operation, we construct a new family of generic vision backbones named HorNet.
arXiv Detail & Related papers (2022-07-28T17:59:02Z) - GhostNets on Heterogeneous Devices via Cheap Operations [129.15798618025127]
We propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations.
Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage.
C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively.
arXiv Detail & Related papers (2022-01-10T11:46:38Z) - Ghost-dil-NetVLAD: A Lightweight Neural Network for Visual Place Recognition [3.6249801498927923]
We propose a lightweight weakly supervised end-to-end neural network consisting of a front-ended perception model called GhostCNN and a learnable VLAD layer as a back-end.
To enhance our proposed lightweight model further, we add dilated convolutions to the Ghost module to get features containing more spatial semantic information, improving accuracy.
arXiv Detail & Related papers (2021-12-22T06:05:02Z) - GhostShiftAddNet: More Features from Energy-Efficient Operations [1.2891210250935146]
Deep convolutional neural networks (CNNs) are computationally and memory intensive.
This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network.
We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations.
arXiv Detail & Related papers (2021-09-20T12:50:42Z) - Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume
Excitation [65.83008812026635]
We construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably.
We present an end-to-end network that we call Correlate-and-Excite (CoEx)
arXiv Detail & Related papers (2021-08-12T14:32:26Z) - Bilateral Grid Learning for Stereo Matching Networks [22.92443311789097]
We present a novel edge-preserving cost volume upsampling module based on the slicing operation in the learned bilateral grid.
The slicing layer is parameter-free, which allows us to obtain a high quality cost volume of high resolution.
We design a real-time network based on this module, which outperforms existing published real-time deep stereo matching networks.
arXiv Detail & Related papers (2021-01-01T09:08:01Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.