Joint Self-supervised Depth and Optical Flow Estimation towards Dynamic
Objects
- URL: http://arxiv.org/abs/2310.00011v1
- Date: Thu, 7 Sep 2023 04:00:52 GMT
- Title: Joint Self-supervised Depth and Optical Flow Estimation towards Dynamic
Objects
- Authors: Zhengyang Lu and Ying Chen
- Abstract summary: In this work, we construct a joint inter-frame-supervised depth and optical flow estimation framework.
For motion segmentation, we adaptively segment the preliminary estimated optical flow map with large areas of connectivity.
Our proposed joint depth and optical flow estimation outperforms existing depth estimators on the KITTI Depth dataset.
- Score: 3.794605440322862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Significant attention has been attracted to deep learning-based depth
estimates. Dynamic objects become the most hard problems in
inter-frame-supervised depth estimates due to the uncertainty in adjacent
frames. Thus, integrating optical flow information with depth estimation is a
feasible solution, as the optical flow is an essential motion representation.
In this work, we construct a joint inter-frame-supervised depth and optical
flow estimation framework, which predicts depths in various motions by
minimizing pixel wrap errors in bilateral photometric re-projections and
optical vectors. For motion segmentation, we adaptively segment the preliminary
estimated optical flow map with large areas of connectivity. In self-supervised
depth estimation, different motion regions are predicted independently and then
composite into a complete depth. Further, the pose and depth estimations
re-synthesize the optical flow maps, serving to compute reconstruction errors
with the preliminary predictions. Our proposed joint depth and optical flow
estimation outperforms existing depth estimators on the KITTI Depth dataset,
both with and without Cityscapes pretraining. Additionally, our optical flow
results demonstrate competitive performance on the KITTI Flow 2015 dataset.
Related papers
- ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior [3.2120448116996103]
This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors.
In the first stage, a water segmentation network is performed to separate the reflection components from the entire image.
The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation.
arXiv Detail & Related papers (2024-04-10T17:25:42Z) - $\mathrm{F^2Depth}$: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis [17.18692080133249]
We propose a self-supervised indoor monocular depth estimation framework called $mathrmF2Depth$.
A self-supervised optical flow estimation network is introduced to supervise depth learning.
The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.
arXiv Detail & Related papers (2024-03-27T11:00:33Z) - Skin the sheep not only once: Reusing Various Depth Datasets to Drive
the Learning of Optical Flow [25.23550076996421]
We propose to leverage the geometric connection between optical flow estimation and stereo matching.
We turn the monocular depth datasets into stereo ones via virtual disparity.
We also introduce virtual camera motion into stereo data to produce additional flows along the vertical direction.
arXiv Detail & Related papers (2023-10-03T06:56:07Z) - EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with
CNN-Transformer [0.0]
We propose a novel lightweight solution named EndoDepthL that integrates CNN and Transformers to predict multi-scale depth maps.
Our approach includes optimizing the network architecture, incorporating multi-scale dilated convolution, and a multi-channel attention mechanism.
To better evaluate the performance of monocular depth estimation in endoscopic imaging, we propose a novel complexity evaluation metric.
arXiv Detail & Related papers (2023-08-04T21:38:29Z) - USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion
with Semantic Guidance and Coupled Networks [31.600708674008384]
USegScene is a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images.
We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.
arXiv Detail & Related papers (2022-07-15T13:25:47Z) - Dimensions of Motion: Learning to Predict a Subspace of Optical Flow
from a Single Image [50.9686256513627]
We introduce the problem of predicting, from a single video frame, a low-dimensional subspace of optical flow which includes the actual instantaneous optical flow.
We show how several natural scene assumptions allow us to identify an appropriate flow subspace via a set of basis flow fields parameterized by disparity.
This provides a new approach to learning these tasks in an unsupervised fashion using monocular input video without requiring camera intrinsics or poses.
arXiv Detail & Related papers (2021-12-02T18:52:54Z) - Sensor-Guided Optical Flow [53.295332513139925]
This paper proposes a framework to guide an optical flow network with external cues to achieve superior accuracy on known or unseen domains.
We show how these can be obtained by combining depth measurements from active sensors with geometry and hand-crafted optical flow algorithms.
arXiv Detail & Related papers (2021-09-30T17:59:57Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level
Optimization [59.9673626329892]
We exploit the global relationship between optical flow and camera motion using epipolar geometry.
We use implicit differentiation to enable back-propagation through the lower-level geometric optimization layer independent of its implementation.
arXiv Detail & Related papers (2020-02-26T22:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.