Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- URL: http://arxiv.org/abs/2112.02306v1
- Date: Sat, 4 Dec 2021 11:02:56 GMT
- Title: Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Authors: Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su
- Abstract summary: We show that self-supervised monocular depth estimation methods generalize poorly to unseen indoor scenes.
We propose a structure distillation approach to learn knacks from a pretrained depth estimator that produces structured but metric-agnostic depth.
By combining distillation with the self-supervised branch that learns metrics from left-right consistency, we attain structured and metric depth for generic indoor scenes.
- Score: 17.222362224649544
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The majority of self-supervised monocular depth estimation methods focus on
driving scenarios. We show that such methods generalize poorly to unseen
complex indoor scenes, where objects are cluttered and arbitrarily arranged in
the near field. To obtain more robustness, we propose a structure distillation
approach to learn knacks from a pretrained depth estimator that produces
structured but metric-agnostic depth due to its in-the-wild mixed-dataset
training. By combining distillation with the self-supervised branch that learns
metrics from left-right consistency, we attain structured and metric depth for
generic indoor scenes and make inferences in real-time. To facilitate learning
and evaluation, we collect SimSIN, a dataset from simulation with thousands of
environments, and UniSIN, a dataset that contains about 500 real scan sequences
of generic indoor environments. We experiment in both sim-to-real and
real-to-real settings, and show improvements both qualitatively and
quantitatively, as well as in downstream applications using our depth maps.
This work provides a full study, covering methods, data, and applications. We
believe the work lays a solid basis for practical indoor depth estimation via
self-supervision.
Related papers
- UnCLe: Unsupervised Continual Learning of Depth Completion [5.677777151863184]
UnCLe is a standardized benchmark for Unsupervised Continual Learning of a multimodal depth estimation task.
We benchmark depth completion models under the practical scenario of unsupervised learning over continuous streams of data.
arXiv Detail & Related papers (2024-10-23T17:56:33Z) - TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs [5.6168844664788855]
This work presents TanDepth, a practical, online scale recovery method for obtaining metric depth results from relative estimations at inference-time.
Tailored for Unmanned Aerial Vehicle (UAV) applications, our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view.
An adaptation to the Cloth Simulation Filter is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points.
arXiv Detail & Related papers (2024-09-08T15:54:43Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model [72.0795843450604]
Current approaches face challenges in maintaining consistent accuracy across diverse scenes.
These methods rely on extensive datasets comprising millions, if not tens of millions, of data for training.
This paper presents SM$4$Depth, a model that seamlessly works for both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-03-13T14:08:25Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments [45.89629401768049]
Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments.
However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices.
We propose a novel framework-IndoorMono++ to improve the performance of self-supervised monocular depth estimation for indoor environments.
arXiv Detail & Related papers (2022-07-18T21:34:43Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - RealMonoDepth: Self-Supervised Monocular Depth Estimation for General
Scenes [11.995578248462946]
Existing supervised methods for monocular depth estimation require accurate depth measurements for training.
Self-supervised approaches have demonstrated impressive results but do not generalise to scenes with different depth ranges or camera baselines.
We introduce RealMonoDepth, a self-supervised monocular depth estimation approach which learns to estimate the real scene depth for a diverse range of indoor and outdoor scenes.
arXiv Detail & Related papers (2020-04-14T02:03:10Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.