Soft Expectation and Deep Maximization for Image Feature Detection
- URL: http://arxiv.org/abs/2104.10291v1
- Date: Wed, 21 Apr 2021 00:35:32 GMT
- Title: Soft Expectation and Deep Maximization for Image Feature Detection
- Authors: Alexander Mai, Allen Yang, Dominique E. Meyer
- Abstract summary: We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Central to the application of many multi-view geometry algorithms is the
extraction of matching points between multiple viewpoints, enabling classical
tasks such as camera pose estimation and 3D reconstruction. Over the decades,
many approaches that characterize these points have been proposed based on
hand-tuned appearance models and more recently data-driven learning methods. We
propose SEDM, an iterative semi-supervised learning process that flips the
question and first looks for repeatable 3D points, then trains a detector to
localize them in image space. Our technique poses the problem as one of
expectation maximization (EM), where the likelihood of the detector locating
the 3D points is the objective function to be maximized. We utilize the
geometry of the scene to refine the estimates of the location of these 3D
points and produce a new pseudo ground truth during the expectation step, then
train a detector to predict this pseudo ground truth in the maximization step.
We apply our detector to standard benchmarks in visual localization, sparse 3D
reconstruction, and mean matching accuracy. Our results show that this new
model trained using SEDM is able to better localize the underlying 3D points in
a scene, improving mean SfM quality by $-0.15\pm0.11$ mean reprojection error
when compared to SuperPoint or $-0.38\pm0.23$ when compared to R2D2.
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - Learning to Produce Semi-dense Correspondences for Visual Localization [11.415451542216559]
This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes.
We propose a novel method that extracts reliable semi-dense 2D-3D matching points based on dense keypoint matches.
The network utilizes both geometric and visual cues to effectively infer 3D coordinates for unobserved keypoints from the observed ones.
arXiv Detail & Related papers (2024-02-13T10:40:10Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale
Visual Localization [44.05930316729542]
We propose EP2P-Loc, a novel large-scale visual localization method for 3D point clouds.
To increase the number of inliers, we propose a simple algorithm to remove invisible 3D points in the image.
For the first time in this task, we employ a differentiable for end-to-end training.
arXiv Detail & Related papers (2023-09-14T07:06:36Z) - LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D
Signals [9.201550006194994]
Learnable matchers often underperform when there exists only small regions of co-visibility between image pairs.
We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks.
We show that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs.
arXiv Detail & Related papers (2023-03-22T17:46:27Z) - Improving Feature-based Visual Localization by Geometry-Aided Matching [21.1967752160412]
We introduce a novel 2D-3D matching method, Geometry-Aided Matching (GAM), which uses both appearance information and geometric context to improve 2D-3D feature matching.
GAM can greatly strengthen the recall of 2D-3D matches while maintaining high precision.
Our proposed localization method achieves state-of-the-art results on multiple visual localization datasets.
arXiv Detail & Related papers (2022-11-16T07:02:12Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.