EDNet: Efficient Disparity Estimation with Cost Volume Combination and
Attention-based Spatial Residual
- URL: http://arxiv.org/abs/2010.13338v4
- Date: Thu, 4 Mar 2021 05:30:42 GMT
- Title: EDNet: Efficient Disparity Estimation with Cost Volume Combination and
Attention-based Spatial Residual
- Authors: Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei,
Xiaowen Chu
- Abstract summary: Existing disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression.
In this paper, we propose a network named EDNet for efficient disparity estimation.
Experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works.
- Score: 17.638034176859932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing state-of-the-art disparity estimation works mostly leverage the 4D
concatenation volume and construct a very deep 3D convolution neural network
(CNN) for disparity regression, which is inefficient due to the high memory
consumption and slow inference speed. In this paper, we propose a network named
EDNet for efficient disparity estimation. Firstly, we construct a combined
volume which incorporates contextual information from the squeezed
concatenation volume and feature similarity measurement from the correlation
volume. The combined volume can be next aggregated by 2D convolutions which are
faster and require less memory than 3D convolutions. Secondly, we propose an
attention-based spatial residual module to generate attention-aware residual
features. The attention mechanism is applied to provide intuitive spatial
evidence about inaccurate regions with the help of error maps at multiple
scales and thus improve the residual learning efficiency. Extensive experiments
on the Scene Flow and KITTI datasets show that EDNet outperforms the previous
3D CNN based works and achieves state-of-the-art performance with significantly
faster speed and less memory consumption.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - Spatial Pruned Sparse Convolution for Efficient 3D Object Detection [41.62839541489369]
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we analyze major components of existing 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead.
We propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS
arXiv Detail & Related papers (2022-09-28T16:19:06Z) - OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D
Medical Data [21.42609249273068]
Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data.
A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution.
We propose Occupancy Networks (OSS-Nets) to accurately and memory-efficiently segment 3D medical data.
arXiv Detail & Related papers (2021-10-20T16:14:26Z) - ES-Net: An Efficient Stereo Matching Network [4.8986598953553555]
Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
arXiv Detail & Related papers (2021-03-05T20:11:39Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - VolumeNet: A Lightweight Parallel Network for Super-Resolution of
Medical Volumetric Data [20.34783243852236]
We propose a 3D convolutional neural network (CNN) for SR of medical volumetric data called ParallelNet using parallel connections.
We show that the proposed VolumeNet significantly reduces the number of model parameters and achieves high precision results.
arXiv Detail & Related papers (2020-10-16T12:53:15Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - FADNet: A Fast and Accurate Network for Disparity Estimation [18.05392578461659]
We propose an efficient and accurate deep network for disparity estimation named FADNet.
It exploits efficient 2D based correlation layers with stacked blocks to preserve fast computation.
It contains multi-scale predictions so as to exploit a multi-scale weight scheduling training technique to improve the accuracy.
arXiv Detail & Related papers (2020-03-24T10:27:11Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.