SMD-Nets: Stereo Mixture Density Networks
- URL: http://arxiv.org/abs/2104.03866v1
- Date: Thu, 8 Apr 2021 16:15:46 GMT
- Title: SMD-Nets: Stereo Mixture Density Networks
- Authors: Fabio Tosi, Yiyi Liao, Carolin Schmitt, Andreas Geiger
- Abstract summary: We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
- Score: 68.56947049719936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite stereo matching accuracy has greatly improved by deep learning in the
last few years, recovering sharp boundaries and high-resolution outputs
efficiently remains challenging. In this paper, we propose Stereo Mixture
Density Networks (SMD-Nets), a simple yet effective learning framework
compatible with a wide class of 2D and 3D architectures which ameliorates both
issues. Specifically, we exploit bimodal mixture densities as output
representation and show that this allows for sharp and precise disparity
estimates near discontinuities while explicitly modeling the aleatoric
uncertainty inherent in the observations. Moreover, we formulate disparity
estimation as a continuous problem in the image domain, allowing our model to
query disparities at arbitrary spatial precision. We carry out comprehensive
experiments on a new high-resolution and highly realistic synthetic stereo
dataset, consisting of stereo pairs at 8Mpx resolution, as well as on
real-world stereo datasets. Our experiments demonstrate increased depth
accuracy near object boundaries and prediction of ultra high-resolution
disparity maps on standard GPUs. We demonstrate the flexibility of our
technique by improving the performance of a variety of stereo backbones.
Related papers
- SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth
Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation.
For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed.
In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Neural Disparity Refinement for Arbitrary Resolution Stereo [67.55946402652778]
We introduce a novel architecture for neural disparity refinement aimed at facilitating deployment of 3D computer vision on cheap and widespread consumer devices.
Our approach relies on a continuous formulation that enables to estimate a refined disparity map at any arbitrary output resolution.
arXiv Detail & Related papers (2021-10-28T18:00:00Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Fusion of Range and Stereo Data for High-Resolution Scene-Modeling [20.824550995195057]
This paper addresses the problem of range-stereo fusion, for the construction of high-resolution depth maps.
We combine low-resolution depth data with high-resolution stereo data, in a maximum a posteriori (MAP) formulation.
The accuracy of the method is not compromised, owing to three properties of the data-term in the energy function.
arXiv Detail & Related papers (2020-12-12T09:37:42Z) - Improving Deep Stereo Network Generalization with Geometric Priors [93.09496073476275]
Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
arXiv Detail & Related papers (2020-08-25T15:24:02Z) - Expanding Sparse Guidance for Stereo Matching [24.74333370941674]
We propose a novel sparsity expansion technique to expand the sparse cues concerning RGB images for local feature enhancement.
Our approach significantly boosts the existing state-of-the-art stereo algorithms with extremely sparse cues.
arXiv Detail & Related papers (2020-04-24T06:41:11Z) - Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels [16.797169907541164]
We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor.
Our network uses a novel architecture to fuse these two sources of information and can overcome the limitations of pure binocular stereo matching.
arXiv Detail & Related papers (2020-03-31T15:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.