Unsupervised Monocular Depth Perception: Focusing on Moving Objects
- URL: http://arxiv.org/abs/2108.13062v1
- Date: Mon, 30 Aug 2021 08:45:02 GMT
- Title: Unsupervised Monocular Depth Perception: Focusing on Moving Objects
- Authors: Hualie Jiang, Laiyan Ding, Zhenglong Sun, Rui Huang
- Abstract summary: In this paper, we show that deliberately manipulating photometric errors can efficiently deal with difficulties better.
We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map.
With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately.
- Score: 5.489557739480878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a flexible passive 3D sensing means, unsupervised learning of depth from
monocular videos is becoming an important research topic. It utilizes the
photometric errors between the target view and the synthesized views from its
adjacent source views as the loss instead of the difference from the ground
truth. Occlusion and scene dynamics in real-world scenes still adversely affect
the learning, despite significant progress made recently. In this paper, we
show that deliberately manipulating photometric errors can efficiently deal
with these difficulties better. We first propose an outlier masking technique
that considers the occluded or dynamic pixels as statistical outliers in the
photometric error map. With the outlier masking, the network learns the depth
of objects that move in the opposite direction to the camera more accurately.
To the best of our knowledge, such cases have not been seriously considered in
the previous works, even though they pose a high risk in applications like
autonomous driving. We also propose an efficient weighted multi-scale scheme to
reduce the artifacts in the predicted depth maps. Extensive experiments on the
KITTI dataset and additional experiments on the Cityscapes dataset have
verified the proposed approach's effectiveness on depth or ego-motion
estimation. Furthermore, for the first time, we evaluate the predicted depth on
the regions of dynamic objects and static background separately for both
supervised and unsupervised methods. The evaluation further verifies the
effectiveness of our proposed technical approach and provides some interesting
observations that might inspire future research in this direction.
Related papers
- SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry
Perception with Adaptive Cross Weighted Loss from Monocular Videos [5.481942307939029]
Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth.
In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation.
Our method shows promising results on KITTI, Make3D, and Cityscapes datasets under multiple tasks.
arXiv Detail & Related papers (2021-08-09T09:21:24Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Targeted Adversarial Perturbations for Monocular Depth Prediction [74.61708733460927]
We study the effect of adversarial perturbations on the task of monocular depth prediction.
Specifically, we explore the ability of small, imperceptible additive perturbations to selectively alter the perceived geometry of the scene.
We show that such perturbations can not only globally re-scale the predicted distances from the camera, but also alter the prediction to match a different target scene.
arXiv Detail & Related papers (2020-06-12T19:29:43Z) - DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth
and Ego-motion from Monocular Videos [9.255509741319583]
This paper shows that carefully manipulating photometric errors can tackle these difficulties better.
The primary improvement is achieved by a statistical technique that can mask out the invisible or nonstationary pixels in the photometric error map.
We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps.
arXiv Detail & Related papers (2020-03-03T07:05:15Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.