EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale
Visual Localization
- URL: http://arxiv.org/abs/2309.07471v1
- Date: Thu, 14 Sep 2023 07:06:36 GMT
- Title: EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale
Visual Localization
- Authors: Minjung Kim, Junseo Koo, Gunhee Kim
- Abstract summary: We propose EP2P-Loc, a novel large-scale visual localization method for 3D point clouds.
To increase the number of inliers, we propose a simple algorithm to remove invisible 3D points in the image.
For the first time in this task, we employ a differentiable for end-to-end training.
- Score: 44.05930316729542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization is the task of estimating a 6-DoF camera pose of a query
image within a provided 3D reference map. Thanks to recent advances in various
3D sensors, 3D point clouds are becoming a more accurate and affordable option
for building the reference map, but research to match the points of 3D point
clouds with pixels in 2D images for visual localization remains challenging.
Existing approaches that jointly learn 2D-3D feature matching suffer from low
inliers due to representational differences between the two modalities, and the
methods that bypass this problem into classification have an issue of poor
refinement. In this work, we propose EP2P-Loc, a novel large-scale visual
localization method that mitigates such appearance discrepancy and enables
end-to-end training for pose estimation. To increase the number of inliers, we
propose a simple algorithm to remove invisible 3D points in the image, and find
all 2D-3D correspondences without keypoint detection. To reduce memory usage
and search complexity, we take a coarse-to-fine approach where we extract
patch-level features from 2D images, then perform 2D patch classification on
each 3D point, and obtain the exact corresponding 2D pixel coordinates through
positional encoding. Finally, for the first time in this task, we employ a
differentiable PnP for end-to-end training. In the experiments on newly curated
large-scale indoor and outdoor benchmarks based on 2D-3D-S and KITTI, we show
that our method achieves the state-of-the-art performance compared to existing
visual localization and image-to-point cloud registration methods.
Related papers
- ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images [19.02348585677397]
Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number of base categories labeled during the training phase.
The biggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundant and richly annotated.
We propose a novel framework ImOV3D to leverage pseudo multimodal representation containing both images and point clouds (PC) to close the modality gap.
arXiv Detail & Related papers (2024-10-31T15:02:05Z) - Robust 3D Point Clouds Classification based on Declarative Defenders [18.51700931775295]
3D point clouds are unstructured and sparse, while 2D images are structured and dense.
In this paper, we explore three distinct algorithms for mapping 3D point clouds into 2D images.
The proposed approaches demonstrate superior accuracy and robustness against adversarial attacks.
arXiv Detail & Related papers (2024-10-13T01:32:38Z) - ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images [47.682942867405224]
ConDense is a framework for 3D pre-training utilizing existing 2D networks and large-scale multi-view datasets.
We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline.
arXiv Detail & Related papers (2024-08-30T05:57:01Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Improving Feature-based Visual Localization by Geometry-Aided Matching [21.1967752160412]
We introduce a novel 2D-3D matching method, Geometry-Aided Matching (GAM), which uses both appearance information and geometric context to improve 2D-3D feature matching.
GAM can greatly strengthen the recall of 2D-3D matches while maintaining high precision.
Our proposed localization method achieves state-of-the-art results on multiple visual localization datasets.
arXiv Detail & Related papers (2022-11-16T07:02:12Z) - CorrI2P: Deep Image-to-Point Cloud Registration via Dense Correspondence [51.91791056908387]
We propose the first feature-based dense correspondence framework for addressing the image-to-point cloud registration problem, dubbed CorrI2P.
Specifically, given a pair of a 2D image before a 3D point cloud, we first transform them into high-dimensional feature space feed the features into a symmetric overlapping region to determine the region where the image point cloud overlap.
arXiv Detail & Related papers (2022-07-12T11:49:31Z) - Unsupervised Learning of Fine Structure Generation for 3D Point Clouds
by 2D Projection Matching [66.98712589559028]
We propose an unsupervised approach for 3D point cloud generation with fine structures.
Our method can recover fine 3D structures from 2D silhouette images at different resolutions.
arXiv Detail & Related papers (2021-08-08T22:15:31Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - P2-Net: Joint Description and Detection of Local Features for Pixel and
Point Matching [78.18641868402901]
This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds.
An ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions.
arXiv Detail & Related papers (2021-03-01T14:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.