Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion
from 3D Geometry
- URL: http://arxiv.org/abs/2003.00766v3
- Date: Thu, 20 Aug 2020 05:26:00 GMT
- Title: Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion
from 3D Geometry
- Authors: Guangming Wang, Chi Zhang, Hesheng Wang, Jingchuan Wang, Yong Wang,
Xinlei Wang
- Abstract summary: In this paper, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region.
In joint unsupervised training of depth and pose, we can segment the occluded region explicitly.
In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow.
- Score: 29.240108776329045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous driving, monocular sequences contain lots of information.
Monocular depth estimation, camera ego-motion estimation and optical flow
estimation in consecutive frames are high-profile concerns recently. By
analyzing tasks above, pixels in the middle frame are modeled into three parts:
the rigid region, the non-rigid region, and the occluded region. In joint
unsupervised training of depth and pose, we can segment the occluded region
explicitly. The occlusion information is used in unsupervised learning of
depth, pose and optical flow, as the image reconstructed by depth-pose and
optical flow will be invalid in occluded regions. A less-than-mean mask is
designed to further exclude the mismatched pixels interfered with by motion or
illumination change in the training of depth and pose networks. This method is
also used to exclude some trivial mismatched pixels in the training of the
optical flow network. Maximum normalization is proposed for depth smoothness
term to restrain depth degradation in textureless regions. In the occluded
region, as depth and camera motion can provide more reliable motion estimation,
they can be used to instruct unsupervised learning of optical flow. Our
experiments in KITTI dataset demonstrate that the model based on three regions,
full and explicit segmentation of the occlusion region, the rigid region, and
the non-rigid region with corresponding unsupervised losses can improve
performance on three tasks significantly. The source code is available at:
https://github.com/guangmingw/DOPlearning.
Related papers
- CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised
Learning of Depth and Pose [13.581694284209885]
Photometric differences are used to train neural networks for estimating depth and camera pose from unlabeled monocular videos.
In this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis.
We mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks.
arXiv Detail & Related papers (2022-12-12T12:18:24Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - End-to-end Learning for Joint Depth and Image Reconstruction from
Diffracted Rotation [10.896567381206715]
We propose a novel end-to-end learning approach for depth from diffracted rotation.
Our approach requires a significantly less complex model and less training data, yet it is superior to existing methods in the task of monocular depth estimation.
arXiv Detail & Related papers (2022-04-14T16:14:37Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry
Perception with Adaptive Cross Weighted Loss from Monocular Videos [5.481942307939029]
Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth.
In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation.
Our method shows promising results on KITTI, Make3D, and Cityscapes datasets under multiple tasks.
arXiv Detail & Related papers (2021-08-09T09:21:24Z) - Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated
Scenes [6.316693022958222]
Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation.
These methods do not match performance of supervised methods on indoor environments with camera rotation.
We propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions.
arXiv Detail & Related papers (2021-06-24T12:27:16Z) - Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes [87.91841050957714]
We present an unsupervised monocular framework for dense depth estimation of dynamic scenes.
We derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points.
Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes.
arXiv Detail & Related papers (2020-12-31T16:02:03Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - SelfDeco: Self-Supervised Monocular Depth Completion in Challenging
Indoor Environments [50.761917113239996]
We present a novel algorithm for self-supervised monocular depth completion.
Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels.
Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv Detail & Related papers (2020-11-10T08:55:07Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.