P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor
Depth Estimation
- URL: http://arxiv.org/abs/2007.07696v1
- Date: Wed, 15 Jul 2020 14:10:43 GMT
- Title: P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor
Depth Estimation
- Authors: Zehao Yu, Lei Jin, and Shenghua Gao
- Abstract summary: This paper tackles the unsupervised depth estimation task in indoor environments.
The paper argues that the poor performance suffers from the non-discriminative point-based matching.
Experiments on NYUv2 and ScanNet show that our P$2$Net outperforms existing approaches by a large margin.
- Score: 37.95666188829359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the unsupervised depth estimation task in indoor
environments. The task is extremely challenging because of the vast areas of
non-texture regions in these scenes. These areas could overwhelm the
optimization process in the commonly used unsupervised depth estimation
framework proposed for outdoor environments. However, even when those regions
are masked out, the performance is still unsatisfactory. In this paper, we
argue that the poor performance suffers from the non-discriminative point-based
matching. To this end, we propose P$^2$Net. We first extract points with large
local gradients and adopt patches centered at each point as its representation.
Multiview consistency loss is then defined over patches. This operation
significantly improves the robustness of the network training. Furthermore,
because those textureless regions in indoor scenes (e.g., wall, floor, roof,
\etc) usually correspond to planar regions, we propose to leverage superpixels
as a plane prior. We enforce the predicted depth to be well fitted by a plane
within each superpixel. Extensive experiments on NYUv2 and ScanNet show that
our P$^2$Net outperforms existing approaches by a large margin. Code is
available at \url{https://github.com/svip-lab/Indoor-SfMLearner}.
Related papers
- NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - VA-DepthNet: A Variational Approach to Single Image Depth Prediction [163.14849753700682]
VA-DepthNet is a simple, effective, and accurate deep neural network approach for the single-image depth prediction problem.
The paper demonstrates the usefulness of the proposed approach via extensive evaluation and ablation analysis over several benchmark datasets.
arXiv Detail & Related papers (2023-02-13T17:55:58Z) - When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z) - Monocular Depth Distribution Alignment with Low Computation [15.05244258071472]
We model the majority of accuracy contrast between light-weight networks and heavy-weight networks.
By perceiving the difference of depth features between every two regions, DANet tends to predict a reasonable scene structure.
Thanks to the alignment of depth distribution shape and scene depth range, DANet sharply alleviates the distribution drift, and achieves a comparable performance with prior heavy-weight methods.
arXiv Detail & Related papers (2022-03-09T06:18:26Z) - PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation [15.751045404065465]
This paper proposes PLNet that leverages the plane and line priors to enhance the depth estimation.
Experiments on NYU Depth V2 and ScanNet show that PLNet outperforms existing methods.
arXiv Detail & Related papers (2021-10-12T09:02:24Z) - StructDepth: Leveraging the structural regularities for self-supervised
indoor depth estimation [7.028319464940422]
Self-supervised monocular depth estimation has achieved impressive performance on outdoor datasets.
But its performance degrades notably in indoor environments because of the lack of textures.
We leverage the structural regularities exhibited in indoor scenes, to train a better depth network.
arXiv Detail & Related papers (2021-08-19T09:26:13Z) - Boundary-induced and scene-aggregated network for monocular depth
prediction [20.358133522462513]
We propose the Boundary-induced and Scene-aggregated network (BS-Net) to predict the dense depth of a single RGB image.
Several experimental results on the NYUD v2 dataset and xffthe iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach.
arXiv Detail & Related papers (2021-02-26T01:43:17Z) - Deep Depth Estimation from Visual-Inertial SLAM [11.814395824799988]
We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system.
The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution.
We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
arXiv Detail & Related papers (2020-07-31T21:28:25Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.