Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
- URL: http://arxiv.org/abs/2003.14299v1
- Date: Tue, 31 Mar 2020 15:39:43 GMT
- Title: Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
- Authors: Yinda Zhang, Neal Wadhwa, Sergio Orts-Escolano, Christian H\"ane, Sean
Fanello, and Rahul Garg
- Abstract summary: We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor.
Our network uses a novel architecture to fuse these two sources of information and can overcome the limitations of pure binocular stereo matching.
- Score: 16.797169907541164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computational stereo has reached a high level of accuracy, but degrades in
the presence of occlusions, repeated textures, and correspondence errors along
edges. We present a novel approach based on neural networks for depth
estimation that combines stereo from dual cameras with stereo from a dual-pixel
sensor, which is increasingly common on consumer cameras. Our network uses a
novel architecture to fuse these two sources of information and can overcome
the above-mentioned limitations of pure binocular stereo matching. Our method
provides a dense depth map with sharp edges, which is crucial for computational
photography applications like synthetic shallow-depth-of-field or 3D Photos.
Additionally, we avoid the inherent ambiguity due to the aperture problem in
stereo cameras by designing the stereo baseline to be orthogonal to the
dual-pixel baseline. We present experiments and comparisons with
state-of-the-art approaches to show that our method offers a substantial
improvement over previous works.
Related papers
- Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction [12.519930982515802]
This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks enables faithful reconstructions.
We collect the first and largest 3-view dual-pixel video dataset, dpMV, to validate our explicit dark knowledge distillation hypothesis.
We show that these methods outperform purely monocular solutions, especially in challenging foreground-background separation regions using faithful guidance from dual pixels.
arXiv Detail & Related papers (2024-05-20T06:34:47Z) - SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth
Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation.
For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed.
In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z) - IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on
Binary Decisions [0.0]
We present a novel deep-learning-based method for Multi-View Stereo.
Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion.
We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results.
arXiv Detail & Related papers (2021-11-29T10:04:24Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Increased-Range Unsupervised Monocular Depth Estimation [8.105699831214608]
In this work, we propose to integrate the advantages of the small and wide baselines.
By training the network using three horizontally aligned views, we obtain accurate depth predictions for both close and far ranges.
Our strategy allows to infer multi-baseline depth from a single image.
arXiv Detail & Related papers (2020-06-23T07:01:32Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.