ES-Net: An Efficient Stereo Matching Network
- URL: http://arxiv.org/abs/2103.03922v1
- Date: Fri, 5 Mar 2021 20:11:39 GMT
- Title: ES-Net: An Efficient Stereo Matching Network
- Authors: Zhengyu Huang, Theodore B. Norris, Panqu Wang
- Abstract summary: Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
- Score: 4.8986598953553555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense stereo matching with deep neural networks is of great interest to the
research community. Existing stereo matching networks typically use slow and
computationally expensive 3D convolutions to improve the performance, which is
not friendly to real-world applications such as autonomous driving. In this
paper, we propose the Efficient Stereo Network (ESNet), which achieves high
performance and efficient inference at the same time. ESNet relies only on 2D
convolution and computes multi-scale cost volume efficiently using a
warping-based method to improve the performance in regions with fine-details.
In addition, we address the matching ambiguity issue in the occluded region by
proposing ESNet-M, a variant of ESNet that additionally estimates an occlusion
mask without supervision. We further improve the network performance by
proposing a new training scheme that includes dataset scheduling and
unsupervised pre-training. Compared with other low-cost dense stereo depth
estimation methods, our proposed approach achieves state-of-the-art performance
on the Scene Flow [1], DrivingStereo [2], and KITTI-2015 dataset [3]. Our code
will be made available.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Bilateral Grid Learning for Stereo Matching Networks [22.92443311789097]
We present a novel edge-preserving cost volume upsampling module based on the slicing operation in the learned bilateral grid.
The slicing layer is parameter-free, which allows us to obtain a high quality cost volume of high resolution.
We design a real-time network based on this module, which outperforms existing published real-time deep stereo matching networks.
arXiv Detail & Related papers (2021-01-01T09:08:01Z) - S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point
Clouds [0.16799377888527683]
We present S3CNet, a sparse convolution based neural network that predicts the semantically completed scene from a single, unified LiDAR point cloud.
We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the Semantic KITTI benchmark.
arXiv Detail & Related papers (2020-12-16T20:14:41Z) - EDNet: Efficient Disparity Estimation with Cost Volume Combination and
Attention-based Spatial Residual [17.638034176859932]
Existing disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression.
In this paper, we propose a network named EDNet for efficient disparity estimation.
Experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works.
arXiv Detail & Related papers (2020-10-26T04:49:44Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.