Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from
Video
- URL: http://arxiv.org/abs/2105.14520v1
- Date: Sun, 30 May 2021 12:39:48 GMT
- Title: Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from
Video
- Authors: Jianfeng Li, Junqiao Zhao, Shuangfu Song, Tiantian Feng
- Abstract summary: Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception.
We use a joint self-supervised method to estimate the three geometric elements.
- Score: 9.94001125780824
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Estimating geometric elements such as depth, camera motion, and optical flow
from images is an important part of the robot's visual perception. We use a
joint self-supervised method to estimate the three geometric elements. Depth
network, optical flow network and camera motion network are independent of each
other but are jointly optimized during training phase. Compared with
independent training, joint training can make full use of the geometric
relationship between geometric elements and provide dynamic and static
information of the scene. In this paper, we improve the joint self-supervision
method from three aspects: network structure, dynamic object segmentation, and
geometric constraints. In terms of network structure, we apply the attention
mechanism to the camera motion network, which helps to take advantage of the
similarity of camera movement between frames. And according to attention
mechanism in Transformer, we propose a plug-and-play convolutional attention
module. In terms of dynamic object, according to the different influences of
dynamic objects in the optical flow self-supervised framework and the
depth-pose self-supervised framework, we propose a threshold algorithm to
detect dynamic regions, and mask that in the loss function respectively. In
terms of geometric constraints, we use traditional methods to estimate the
fundamental matrix from the corresponding points to constrain the camera motion
network. We demonstrate the effectiveness of our method on the KITTI dataset.
Compared with other joint self-supervised methods, our method achieves
state-of-the-art performance in the estimation of pose and optical flow, and
the depth estimation has also achieved competitive results. Code will be
available https://github.com/jianfenglihg/Unsupervised_geometry.
Related papers
- Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth
Estimation in Dynamic Scenes [19.810725397641406]
We propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly.
Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation.
Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior.
arXiv Detail & Related papers (2023-01-14T09:43:23Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks [31.600708674008384]
USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
arXiv Detail & Related papers (2022-07-15T13:25:47Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Self-Supervised Learning of Depth and Ego-Motion from Video by
Alternative Training and Geometric Constraints from 3D to 2D [5.481942307939029]
Self-supervised learning of depth and ego-motion from unlabeled monocular video has acquired promising results.
In this paper, we aim to improve the depth-pose learning performance without the auxiliary tasks.
We design a log-scale 3D structural consistency loss to put more emphasis on the smaller depth values during training.
arXiv Detail & Related papers (2021-08-04T11:40:53Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - 3D Scene Geometry-Aware Constraint for Camera Localization with Deep
Learning [11.599633757222406]
Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods.
In this work, we propose a compact network for absolute camera pose regression.
Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents.
arXiv Detail & Related papers (2020-05-13T04:15:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.