Cascade Network for Self-Supervised Monocular Depth Estimation
- URL: http://arxiv.org/abs/2009.06223v1
- Date: Mon, 14 Sep 2020 06:50:05 GMT
- Title: Cascade Network for Self-Supervised Monocular Depth Estimation
- Authors: Chunlai Chai, Yukuan Lou, Shijin Zhang
- Abstract summary: We propose a new self-supervised learning method based on cascade networks.
Compared with the previous self-supervised methods, our method has improved accuracy and reliability.
We show a cascaded neural network that divides the target scene into parts of different sight distances and trains them separately to generate a better depth map.
- Score: 0.07161783472741746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a classical compute vision problem to obtain real scene depth maps by
using a monocular camera, which has been widely concerned in recent years.
However, training this model usually requires a large number of artificially
labeled samples. To solve this problem, some researchers use a self-supervised
learning model to overcome this problem and reduce the dependence on manually
labeled data. Nevertheless, the accuracy and reliability of these methods have
not reached the expected standard. In this paper, we propose a new
self-supervised learning method based on cascade networks. Compared with the
previous self-supervised methods, our method has improved accuracy and
reliability, and we have proved this by experiments. We show a cascaded neural
network that divides the target scene into parts of different sight distances
and trains them separately to generate a better depth map. Our approach is
divided into the following four steps. In the first step, we use the
self-supervised model to estimate the depth of the scene roughly. In the second
step, the depth of the scene generated in the first step is used as a label to
divide the scene into different depth parts. The third step is to use models
with different parameters to generate depth maps of different depth parts in
the target scene, and the fourth step is to fuse the depth map. Through the
ablation study, we demonstrated the effectiveness of each component
individually and showed high-quality, state-of-the-art results in the KITTI
benchmark.
Related papers
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen
Indoor Scene [57.26600120397529]
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes.
We develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes.
arXiv Detail & Related papers (2023-07-27T04:49:36Z) - SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via
Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative.
In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation.
Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Self-Supervised Learning for Monocular Depth Estimation from Aerial
Imagery [0.20072624123275526]
We present a method for self-supervised learning for monocular depth estimation from aerial imagery.
For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information.
By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application.
arXiv Detail & Related papers (2020-08-17T12:20:46Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.