Self-Supervised Multi-Frame Monocular Scene Flow
- URL: http://arxiv.org/abs/2105.02216v1
- Date: Wed, 5 May 2021 17:49:55 GMT
- Title: Self-Supervised Multi-Frame Monocular Scene Flow
- Authors: Junhwa Hur, Stefan Roth
- Abstract summary: We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
- Score: 61.588808225321735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 3D scene flow from a sequence of monocular images has been gaining
increased attention due to the simple, economical capture setup. Owing to the
severe ill-posedness of the problem, the accuracy of current methods has been
limited, especially that of efficient, real-time approaches. In this paper, we
introduce a multi-frame monocular scene flow network based on self-supervised
learning, improving the accuracy over previous networks while retaining
real-time efficiency. Based on an advanced two-frame baseline with a
split-decoder design, we propose (i) a multi-frame model using a triple frame
input and convolutional LSTM connections, (ii) an occlusion-aware census loss
for better accuracy, and (iii) a gradient detaching strategy to improve
training stability. On the KITTI dataset, we observe state-of-the-art accuracy
among monocular scene flow methods based on self-supervised learning.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting
Ego-Motion Rigidity [13.02735046166494]
Self-supervised monocular scene flow estimation has received increasing attention for its simple and economical sensor setup.
We propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning.
On the KITTI scene flow benchmark, our approach improves the SF-all metric of the state-of-the-art self-supervised monocular method by 44%.
arXiv Detail & Related papers (2023-09-04T00:30:06Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - RAFT-MSF: Self-Supervised Monocular Scene Flow using Recurrent Optimizer [21.125470798719967]
We introduce a self-supervised monocular scene flow method that substantially improves the accuracy over the previous approaches.
Based on RAFT, a state-of-the-art optical flow model, we design a new decoder to iteratively update 3D motion fields and disparity maps simultaneously.
Our method achieves state-of-the-art accuracy among all self-supervised monocular scene flow methods, improving accuracy by 34.2%.
arXiv Detail & Related papers (2022-05-03T15:43:57Z) - RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry [0.0]
We present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation.
RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset.
arXiv Detail & Related papers (2022-03-14T15:03:24Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation [87.74617110803189]
Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision.
We present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions.
arXiv Detail & Related papers (2020-11-19T23:23:48Z) - Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video.
We show that the proposed model can reliably predict disparity and scene flow in challenging imagery.
It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z) - Self-Supervised Monocular Scene Flow Estimation [27.477810324117016]
We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance.
By taking an inverse problem view, we design a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously.
arXiv Detail & Related papers (2020-04-08T17:55:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.