Uncertainty-Driven Dense Two-View Structure from Motion
- URL: http://arxiv.org/abs/2302.00523v1
- Date: Wed, 1 Feb 2023 15:52:24 GMT
- Title: Uncertainty-Driven Dense Two-View Structure from Motion
- Authors: Weirong Chen, Suryansh Kumar, Fisher Yu
- Abstract summary: This work introduces an effective and practical solution to the dense two-view structure from motion (SfM) problem.
With the carefully estimated camera pose and predicted per-pixel optical flow correspondences, a dense depth of the scene is computed.
The proposed approach achieves remarkable depth accuracy and state-of-the-art camera pose results superseding SuperPoint and SuperGlue accuracy when tested on benchmark datasets such as DeMoN, YFCC100M, and ScanNet.
- Score: 27.17774054474532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces an effective and practical solution to the dense
two-view structure from motion (SfM) problem. One vital question addressed is
how to mindfully use per-pixel optical flow correspondence between two frames
for accurate pose estimation -- as perfect per-pixel correspondence between two
images is difficult, if not impossible, to establish. With the carefully
estimated camera pose and predicted per-pixel optical flow correspondences, a
dense depth of the scene is computed. Later, an iterative refinement procedure
is introduced to further improve optical flow matching confidence, camera pose,
and depth, exploiting their inherent dependency in rigid SfM. The fundamental
idea presented is to benefit from per-pixel uncertainty in the optical flow
estimation and provide robustness to the dense SfM system via an online
refinement. Concretely, we introduce a pipeline consisting of (i) an
uncertainty-aware dense optical flow estimation approach that provides
per-pixel correspondence with their confidence score of matching; (ii) a
weighted dense bundle adjustment formulation that depends on optical flow
uncertainty and bidirectional optical flow consistency to refine both pose and
depth; (iii) a depth estimation network that considers its consistency with the
estimated poses and optical flow respecting epipolar constraint. Extensive
experiments show that the proposed approach achieves remarkable depth accuracy
and state-of-the-art camera pose results superseding SuperPoint and SuperGlue
accuracy when tested on benchmark datasets such as DeMoN, YFCC100M, and
ScanNet.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - Frequency-Aware Self-Supervised Monocular Depth Estimation [41.97188738587212]
We present two versatile methods to enhance self-supervised monocular depth estimation models.
The high generalizability of our methods is achieved by solving the fundamental and ubiquitous problems in photometric loss function.
We are the first to propose blurring images to improve depth estimators with an interpretable analysis.
arXiv Detail & Related papers (2022-10-11T14:30:26Z) - Dense Optical Flow from Event Cameras [55.79329250951028]
We propose to incorporate feature correlation and sequential processing into dense optical flow estimation from event cameras.
Our proposed approach computes dense optical flow and reduces the end-point error by 23% on MVSEC.
arXiv Detail & Related papers (2021-08-24T07:39:08Z) - High-Resolution Optical Flow from 1D Attention and Correlation [89.61824964952949]
We propose a new method for high-resolution optical flow estimation with significantly less computation.
We first perform a 1D attention operation in the vertical direction of the target image, and then a simple 1D correlation in the horizontal direction of the attended image.
Experiments on Sintel, KITTI and real-world 4K resolution images demonstrated the effectiveness and superiority of our proposed method.
arXiv Detail & Related papers (2021-04-28T17:56:34Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Spatially-Variant CNN-based Point Spread Function Estimation for Blind
Deconvolution and Depth Estimation in Optical Microscopy [6.09170287691728]
We present a method that improves the resolution of light microscopy images of thin, yet non-flat objects.
We estimate the parameters of a spatially-variant Point-Spread function (PSF) model using a Convolutional Neural Network (CNN)
Our method recovers PSF parameters from the image itself with up to a squared Pearson correlation coefficient of 0.99 in ideal conditions.
arXiv Detail & Related papers (2020-10-08T14:20:16Z) - Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level
Optimization [59.9673626329892]
We exploit the global relationship between optical flow and camera motion using epipolar geometry.
We use implicit differentiation to enable back-propagation through the lower-level geometric optimization layer independent of its implementation.
arXiv Detail & Related papers (2020-02-26T22:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.