Deep Two-View Structure-from-Motion Revisited
- URL: http://arxiv.org/abs/2104.00556v1
- Date: Thu, 1 Apr 2021 15:31:20 GMT
- Title: Deep Two-View Structure-from-Motion Revisited
- Authors: Jianyuan Wang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang,
Nikolai Smolyanskiy, Hongdong Li
- Abstract summary: Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
- Score: 83.93809929963969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction
and visual SLAM. Existing deep learning-based approaches formulate the problem
by either recovering absolute pose scales from two consecutive frames or
predicting a depth map from a single image, both of which are ill-posed
problems. In contrast, we propose to revisit the problem of deep two-view SfM
by leveraging the well-posedness of the classic pipeline. Our method consists
of 1) an optical flow estimation network that predicts dense correspondences
between two frames; 2) a normalized pose estimation module that computes
relative camera poses from the 2D optical flow correspondences, and 3) a
scale-invariant depth estimation network that leverages epipolar geometry to
reduce the search space, refine the dense correspondences, and estimate
relative depth maps. Extensive experiments show that our method outperforms all
state-of-the-art two-view SfM methods by a clear margin on KITTI depth, KITTI
VO, MVS, Scenes11, and SUN3D datasets in both relative pose and depth
estimation.
Related papers
- DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation [17.99904937160487]
DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams.
It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
arXiv Detail & Related papers (2024-05-27T08:55:17Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - Exploiting Correspondences with All-pairs Correlations for Multi-view
Depth Estimation [19.647670347925754]
Multi-view depth estimation plays a critical role in reconstructing and understanding the 3D world.
We design a novel iterative multi-view depth estimation framework mimicking the optimization process.
We conduct sufficient experiments on ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-05-05T07:38:31Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative
Depth Prediction [4.9188958016378495]
We propose a dense monocular SLAM system, named DeepFusion, that is capable of recovering a globally consistent 3D structure.
We use a visual SLAM to reliably recover the camera poses and semi-dense maps of depth thes, and then use relative depth prediction to densify the semi-dense depth maps and refine the pose-graph.
Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.
arXiv Detail & Related papers (2020-06-07T05:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.