CGI-Stereo: Accurate and Real-Time Stereo Matching via Context and
Geometry Interaction
- URL: http://arxiv.org/abs/2301.02789v1
- Date: Sat, 7 Jan 2023 06:28:04 GMT
- Title: CGI-Stereo: Accurate and Real-Time Stereo Matching via Context and
Geometry Interaction
- Authors: Gangwei Xu, Huan Zhou, Xin Yang
- Abstract summary: CGI-Stereo is a novel neural network architecture that can concurrently achieve real-time performance, state-of-the-art accuracy, and strong generalization ability.
The core of CGI-Stereo is a Context and Geometry Fusion block which adaptively fuses context and geometry information.
The proposed CGF can be easily embedded into many existing stereo matching networks.
- Score: 8.484952030063114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose CGI-Stereo, a novel neural network architecture
that can concurrently achieve real-time performance, state-of-the-art accuracy,
and strong generalization ability. The core of our CGI-Stereo is a Context and
Geometry Fusion (CGF) block which adaptively fuses context and geometry
information for more accurate and efficient cost aggregation and meanwhile
provides feedback to feature learning to guide more effective contextual
feature extraction. The proposed CGF can be easily embedded into many existing
stereo matching networks, such as PSMNet, GwcNet and ACVNet. The resulting
networks are improved in accuracy by a large margin. Specially, the model which
integrates our CGF with ACVNet could rank 1st on the KITTI 2012 leaderboard
among all the published methods. We further propose an informative and concise
cost volume, named Attention Feature Volume (AFV), which exploits a correlation
volume as attention weights to filter a feature volume. Based on CGF and AFV,
the proposed CGI-Stereo outperforms all other published real-time methods on
KITTI benchmarks and shows better generalization ability than other real-time
methods. The code is available at https://github.com/gangweiX/CGI-Stereo.
Related papers
- GraFPrint: A GNN-Based Approach for Audio Identification [11.71702857714935]
GraFPrint is an audio identification framework that leverages the structural learning capabilities of Graph Neural Networks (GNNs) to create robust audio fingerprints.
GraFPrint demonstrates superior performance on large-scale datasets at various levels of granularity, proving to be both lightweight and scalable.
arXiv Detail & Related papers (2024-10-14T18:20:09Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - TC-SKNet with GridMask for Low-complexity Classification of Acoustic
scene [15.010375209235924]
We combine Selective Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive field of convolution kernels.
GridMask is a data augmentation strategy by masking part of the raw data or feature area.
As a result, a peak accuracy of 59.87% TC-SKNet is equivalent to that of SOTA, but the parameters only use 20.9 K.
arXiv Detail & Related papers (2022-10-05T14:24:17Z) - Accurate and Efficient Stereo Matching via Attention Concatenation
Volume [33.615312186946866]
We present a novel cost volume construction method, named attention concatenation volume (ACV)
ACV generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume.
We further design a fast version of ACV to enable real-time performance, named Fast-ACV, which generates high likelihood disparity hypotheses.
arXiv Detail & Related papers (2022-09-23T08:14:30Z) - SVTS: Scalable Video-to-Speech Synthesis [105.29009019733803]
We introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder.
We are the first to show intelligible results on the challenging LRS3 dataset.
arXiv Detail & Related papers (2022-05-04T13:34:07Z) - SCGC : Self-Supervised Contrastive Graph Clustering [1.1470070927586016]
Graph clustering discovers groups or communities within networks.
Deep learning methods such as autoencoders cannot incorporate rich structural information.
We propose Self-Supervised Contrastive Graph Clustering (SCGC)
arXiv Detail & Related papers (2022-04-27T01:38:46Z) - Group Contextualization for Video Recognition [80.3842253625557]
Group contextualization (GC) can boost the performance of 2D-CNN (e.g., TSN) and TSM.
GC embeds feature with four different kinds of contexts in parallel.
Group contextualization can boost the performance of 2D-CNN (e.g., TSN) to a level comparable to the state-the-art video networks.
arXiv Detail & Related papers (2022-03-18T01:49:40Z) - Compact Graph Structure Learning via Mutual Information Compression [79.225671302689]
Graph Structure Learning (GSL) has attracted considerable attentions in its capacity of optimizing graph structure and learning parameters of Graph Neural Networks (GNNs)
We propose a Compact GSL architecture by MI compression, named CoGSL.
We conduct extensive experiments on several datasets under clean and attacked conditions, which demonstrate the effectiveness and robustness of CoGSL.
arXiv Detail & Related papers (2022-01-14T16:22:33Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Adaptive Visibility Graph Neural Network and It's Application in
Modulation Classification [2.3228726690478547]
We propose an Adaptive Visibility Graph (AVG) algorithm that can adaptively map time series into graphs.
We then adopt AVGNet for radio signal modulation classification which is an important task in the field of wireless communication.
arXiv Detail & Related papers (2021-06-16T06:00:49Z) - Heuristic Semi-Supervised Learning for Graph Generation Inspired by
Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph.
In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.