StructDepth: Leveraging the structural regularities for self-supervised
indoor depth estimation
- URL: http://arxiv.org/abs/2108.08574v1
- Date: Thu, 19 Aug 2021 09:26:13 GMT
- Title: StructDepth: Leveraging the structural regularities for self-supervised
indoor depth estimation
- Authors: Boying Li, Yuan Huang, Zeyu Liu, Danping Zou, and Wenxian Yu
- Abstract summary: Self-supervised monocular depth estimation has achieved impressive performance on outdoor datasets.
But its performance degrades notably in indoor environments because of the lack of textures.
We leverage the structural regularities exhibited in indoor scenes, to train a better depth network.
- Score: 7.028319464940422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular depth estimation has achieved impressive
performance on outdoor datasets. Its performance however degrades notably in
indoor environments because of the lack of textures. Without rich textures, the
photometric consistency is too weak to train a good depth network. Inspired by
the early works on indoor modeling, we leverage the structural regularities
exhibited in indoor scenes, to train a better depth network. Specifically, we
adopt two extra supervisory signals for self-supervised training: 1) the
Manhattan normal constraint and 2) the co-planar constraint. The Manhattan
normal constraint enforces the major surfaces (the floor, ceiling, and walls)
to be aligned with dominant directions. The co-planar constraint states that
the 3D points be well fitted by a plane if they are located within the same
planar region. To generate the supervisory signals, we adopt two components to
classify the major surface normal into dominant directions and detect the
planar regions on the fly during training. As the predicted depth becomes more
accurate after more training epochs, the supervisory signals also improve and
in turn feedback to obtain a better depth model. Through extensive experiments
on indoor benchmark datasets, the results show that our network outperforms the
state-of-the-art methods. The source code is available at
https://github.com/SJTU-ViSYS/StructDepth .
Related papers
- GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a
Gradient-Aware Mask and Semantic Constraints [12.426365333096264]
We propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints.
The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions.
The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries.
arXiv Detail & Related papers (2024-02-22T07:53:34Z) - Deeper into Self-Supervised Monocular Indoor Depth Estimation [7.30562653023176]
Self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers.
In this work, our proposed method, named IndoorDepth, consists of two innovations.
Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-12-03T04:55:32Z) - NDDepth: Normal-Distance Assisted Monocular Depth Estimation and
Completion [18.974297347310287]
We introduce novel physics (geometry)-driven deep learning frameworks for monocular depth estimation and completion.
Our method exceeds in performance prior state-of-the-art monocular depth estimation and completion competitors.
arXiv Detail & Related papers (2023-11-13T09:01:50Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - Joint Prediction of Monocular Depth and Structure using Planar and
Parallax Geometry [4.620624344434533]
Supervised learning depth estimation methods can achieve good performance when trained on high-quality ground-truth, like LiDAR data.
We propose a novel approach combining structure information from a promising Plane and Parallax geometry pipeline with depth information into a U-Net supervised learning network.
Our model has impressive performance on depth prediction of thin objects and edges, and compared to structure prediction baseline, our model performs more robustly.
arXiv Detail & Related papers (2022-07-13T17:04:05Z) - P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.
An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z) - PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation [15.751045404065465]
This paper proposes PLNet that leverages the plane and line priors to enhance the depth estimation.
Experiments on NYU Depth V2 and ScanNet show that PLNet outperforms existing methods.
arXiv Detail & Related papers (2021-10-12T09:02:24Z) - MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments [55.05401912853467]
Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart.
The depth range of indoor sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues.
The maximum distance in outdoor scenes mostly stays the same as the camera usually sees the sky.
The motions of outdoor sequences are pre-dominantly translational, especially for driving datasets such as KITTI.
arXiv Detail & Related papers (2021-07-26T18:45:14Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.