Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry
Perception with Adaptive Cross Weighted Loss from Monocular Videos
- URL: http://arxiv.org/abs/2108.03893v2
- Date: Tue, 10 Aug 2021 05:02:03 GMT
- Title: Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry
Perception with Adaptive Cross Weighted Loss from Monocular Videos
- Authors: Jiaojiao Fang, Guizhong Liu
- Abstract summary: Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth.
In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation.
Our method shows promising results on KITTI, Make3D, and Cityscapes datasets under multiple tasks.
- Score: 5.481942307939029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised deep learning-based 3D scene understanding methods can
overcome the difficulty of acquiring the densely labeled ground-truth and have
made a lot of advances. However, occlusions and moving objects are still some
of the major limitations. In this paper, we explore the learnable occlusion
aware optical flow guided self-supervised depth and camera pose estimation by
an adaptive cross weighted loss to address the above limitations. Firstly, we
explore to train the learnable occlusion mask fused optical flow network by an
occlusion-aware photometric loss with the temporally supplemental information
and backward-forward consistency of adjacent views. And then, we design an
adaptive cross-weighted loss between the depth-pose and optical flow loss of
the geometric and photometric error to distinguish the moving objects which
violate the static scene assumption. Our method shows promising results on
KITTI, Make3D, and Cityscapes datasets under multiple tasks. We also show good
generalization ability under a variety of challenging scenarios.
Related papers
- AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - 3D shape reconstruction of semi-transparent worms [0.950214811819847]
3D shape reconstruction typically requires identifying object features or textures in multiple images of a subject.
Here we overcome these challenges by rendering a candidate shape with adaptive blurring and transparency for comparison with the images.
We model the slender Caenorhabditis elegans as a 3D curve using an intrinsic parametrisation that naturally admits biologically-informed constraints and regularisation.
arXiv Detail & Related papers (2023-04-28T13:29:36Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - Towards Non-Line-of-Sight Photography [48.491977359971855]
Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects.
Active NLOS imaging systems rely on the capture of the time of flight of light through the scene.
We propose a new problem formulation, called NLOS photography, to specifically address this deficiency.
arXiv Detail & Related papers (2021-09-16T08:07:13Z) - Unsupervised Monocular Depth Perception: Focusing on Moving Objects [5.489557739480878]
In this paper, we show that deliberately manipulating photometric errors can efficiently deal with difficulties better.
We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map.
With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately.
arXiv Detail & Related papers (2021-08-30T08:45:02Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Learning to See Through Obstructions [117.77024641706451]
We present a learning-based approach for removing unwanted obstructions from a short sequence of images captured by a moving camera.
Our method leverages the motion differences between the background and the obstructing elements to recover both layers.
We show that training on synthetically generated data transfers well to real images.
arXiv Detail & Related papers (2020-04-02T17:59:12Z) - DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth
and Ego-motion from Monocular Videos [9.255509741319583]
This paper shows that carefully manipulating photometric errors can tackle these difficulties better.
The primary improvement is achieved by a statistical technique that can mask out the invisible or nonstationary pixels in the photometric error map.
We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps.
arXiv Detail & Related papers (2020-03-03T07:05:15Z) - Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion
from 3D Geometry [29.240108776329045]
In this paper, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region.
In joint unsupervised training of depth and pose, we can segment the occluded region explicitly.
In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow.
arXiv Detail & Related papers (2020-03-02T11:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.