USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks
- URL: http://arxiv.org/abs/2207.07469v1
- Date: Fri, 15 Jul 2022 13:25:47 GMT
- Title: USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks
- Authors: Johan Vertens, Wolfram Burgard
- Abstract summary: USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
- Score: 31.600708674008384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we propose USegScene, a framework for semantically guided
unsupervised learning of depth, optical flow and ego-motion estimation for
stereo camera images using convolutional neural networks. Our framework
leverages semantic information for improved regularization of depth and optical
flow maps, multimodal fusion and occlusion filling considering dynamic rigid
object motions as independent SE(3) transformations. Furthermore, complementary
to pure photo-metric matching, we propose matching of semantic features,
pixel-wise classes and object instance borders between the consecutive images.
In contrast to previous methods, we propose a network architecture that jointly
predicts all outputs using shared encoders and allows passing information
across the task-domains, e.g., the prediction of optical flow can benefit from
the prediction of the depth. Furthermore, we explicitly learn the depth and
optical flow occlusion maps inside the network, which are leveraged in order to
improve the predictions in therespective regions. We present results on the
popular KITTI dataset and show that our approach outperforms other methods by a
large margin.
Related papers
- De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - DELAD: Deep Landweber-guided deconvolution with Hessian and sparse prior [0.22940141855172028]
We present a model for non-blind image deconvolution that incorporates the classic iterative method into a deep learning application.
We build our network based on the iterative Landweber deconvolution algorithm, which is integrated with trainable convolutional layers to enhance the recovered image structures and details.
arXiv Detail & Related papers (2022-09-30T11:15:03Z) - Content-aware Warping for View Synthesis [110.54435867693203]
We propose content-aware warping, which adaptively learns the weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.
Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from two source views.
Experimental results on structured light field datasets with wide baselines and unstructured multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-01-22T11:35:05Z) - Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from
Video [9.94001125780824]
Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception.
We use a joint self-supervised method to estimate the three geometric elements.
arXiv Detail & Related papers (2021-05-30T12:39:48Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - SSGP: Sparse Spatial Guided Propagation for Robust and Generic
Interpolation [15.71870284091698]
Interpolation of sparse pixel information towards a dense target resolution finds its application across multiple disciplines in computer vision.
Our work is inspired by latest trends in depth completion that tackle the problem of dense guidance for sparse information.
We create a generic cross-domain architecture that can be applied for a multitude of problems like optical flow, scene flow, or depth completion.
arXiv Detail & Related papers (2020-08-21T07:39:41Z) - Semantics-Driven Unsupervised Learning for Monocular Depth and
Ego-Motion Estimation [33.83396613039467]
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos.
Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training.
arXiv Detail & Related papers (2020-06-08T05:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.