Related papers: P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation

P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation

URL: http://arxiv.org/abs/2007.07696v1
Date: Wed, 15 Jul 2020 14:10:43 GMT
Title: P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation
Authors: Zehao Yu, Lei Jin, and Shenghua Gao
Abstract summary: This paper tackles the unsupervised depth estimation task in indoor environments. The paper argues that the poor performance suffers from the non-discriminative point-based matching. Experiments on NYUv2 and ScanNet show that our P$2$Net outperforms existing approaches by a large margin.
Score: 37.95666188829359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper tackles the unsupervised depth estimation task in indoor environments. The task is extremely challenging because of the vast areas of non-texture regions in these scenes. These areas could overwhelm the optimization process in the commonly used unsupervised depth estimation framework proposed for outdoor environments. However, even when those regions are masked out, the performance is still unsatisfactory. In this paper, we argue that the poor performance suffers from the non-discriminative point-based matching. To this end, we propose P$^2$Net. We first extract points with large local gradients and adopt patches centered at each point as its representation. Multiview consistency loss is then defined over patches. This operation significantly improves the robustness of the network training. Furthermore, because those textureless regions in indoor scenes (e.g., wall, floor, roof, \etc) usually correspond to planar regions, we propose to leverage superpixels as a plane prior. We enforce the predicted depth to be well fitted by a plane within each superpixel. Extensive experiments on NYUv2 and ScanNet show that our P$^2$Net outperforms existing approaches by a large margin. Code is available at \url{https://github.com/svip-lab/Indoor-SfMLearner}.

Related papers

P2Object: Single Point Supervised Object Detection and Instance Segmentation [58.778288785355]
We introduce Point-to-Box Network (P2BNet), which constructs balanced textbftextitinstance-level proposal bags P2MNet can generate more precise bounding boxes and generalize to segmentation tasks. Our method largely surpasses the previous methods in terms of the mean average precision on COCO, VOC, and Cityscapes.
arXiv Detail & Related papers (2025-04-10T14:51:08Z)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z)
VA-DepthNet: A Variational Approach to Single Image Depth Prediction [163.14849753700682]
VA-DepthNet is a simple, effective, and accurate deep neural network approach for the single-image depth prediction problem. The paper demonstrates the usefulness of the proposed approach via extensive evaluation and ablation analysis over several benchmark datasets.
arXiv Detail & Related papers (2023-02-13T17:55:58Z)
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z)
Monocular Depth Distribution Alignment with Low Computation [15.05244258071472]
We model the majority of accuracy contrast between light-weight networks and heavy-weight networks. By perceiving the difference of depth features between every two regions, DANet tends to predict a reasonable scene structure. Thanks to the alignment of depth distribution shape and scene depth range, DANet sharply alleviates the distribution drift, and achieves a comparable performance with prior heavy-weight methods.
arXiv Detail & Related papers (2022-03-09T06:18:26Z)
PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation [15.751045404065465]
This paper proposes PLNet that leverages the plane and line priors to enhance the depth estimation. Experiments on NYU Depth V2 and ScanNet show that PLNet outperforms existing methods.
arXiv Detail & Related papers (2021-10-12T09:02:24Z)
StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation [7.028319464940422]
Self-supervised monocular depth estimation has achieved impressive performance on outdoor datasets. But its performance degrades notably in indoor environments because of the lack of textures. We leverage the structural regularities exhibited in indoor scenes, to train a better depth network.
arXiv Detail & Related papers (2021-08-19T09:26:13Z)
Boundary-induced and scene-aggregated network for monocular depth prediction [20.358133522462513]
We propose the Boundary-induced and Scene-aggregated network (BS-Net) to predict the dense depth of a single RGB image. Several experimental results on the NYUD v2 dataset and xffthe iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach.
arXiv Detail & Related papers (2021-02-26T01:43:17Z)
Deep Depth Estimation from Visual-Inertial SLAM [11.814395824799988]
We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system. The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution. We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
arXiv Detail & Related papers (2020-07-31T21:28:25Z)
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video. Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
Depth Based Semantic Scene Completion with Position Importance Aware Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion. It extracts both 2D and 3D features from multi-stages using fine-grained depth information. It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.