DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion
- URL: http://arxiv.org/abs/2210.05517v1
- Date: Tue, 11 Oct 2022 15:07:25 GMT
- Title: DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion
- Authors: Yuxi Xiao, Li Li, Xiaodi Li and Jian Yao
- Abstract summary: Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
- Score: 9.294501649791016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction
and visual SLAM (vSLAM). Many existing end-to-end learning-based methods
usually formulate it as a brute regression problem. However, the inadequate
utilization of traditional geometry model makes the model not robust in unseen
environments. To improve the generalization capability and robustness of
end-to-end two-view SfM network, we formulate the two-view SfM problem as a
maximum likelihood estimation (MLE) and solve it with the proposed framework,
denoted as DeepMLE. First, we propose to take the deep multi-scale correlation
maps to depict the visual similarities of 2D image matches decided by
ego-motion. In addition, in order to increase the robustness of our framework,
we formulate the likelihood function of the correlations of 2D image matches as
a Gaussian and Uniform mixture distribution which takes the uncertainty caused
by illumination changes, image noise and moving objects into account.
Meanwhile, an uncertainty prediction module is presented to predict the
pixel-wise distribution parameters. Finally, we iteratively refine the depth
and relative camera pose using the gradient-like information to maximize the
likelihood function of the correlations. Extensive experimental results on
several datasets prove that our method significantly outperforms the
state-of-the-art end-to-end two-view SfM approaches in accuracy and
generalization capability.
Related papers
- Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Frequency-Aware Self-Supervised Monocular Depth Estimation [41.97188738587212]
We present two versatile methods to enhance self-supervised monocular depth estimation models.
The high generalizability of our methods is achieved by solving the fundamental and ubiquitous problems in photometric loss function.
We are the first to propose blurring images to improve depth estimators with an interpretable analysis.
arXiv Detail & Related papers (2022-10-11T14:30:26Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative
Depth Prediction [4.9188958016378495]
We propose a dense monocular SLAM system, named DeepFusion, that is capable of recovering a globally consistent 3D structure.
We use a visual SLAM to reliably recover the camera poses and semi-dense maps of depth thes, and then use relative depth prediction to densify the semi-dense depth maps and refine the pose-graph.
Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.
arXiv Detail & Related papers (2020-06-07T05:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.