Long-term Visual Localization with Mobile Sensors
- URL: http://arxiv.org/abs/2304.07691v1
- Date: Sun, 16 Apr 2023 04:35:10 GMT
- Title: Long-term Visual Localization with Mobile Sensors
- Authors: Shen Yan, Yu Liu, Long Wang, Zehong Shen, Zhen Peng, Haomin Liu,
Maojun Zhang, Guofeng Zhang, Xiaowei Zhou
- Abstract summary: We propose to leverage additional sensors on a mobile phone, mainly GPS, compass, and gravity sensor, to solve this challenging problem.
With the initial pose, we are also able to devise a direct 2D-3D matching network to efficiently establish 2D-3D correspondences.
We benchmark our method as well as several state-of-the-art baselines and demonstrate the effectiveness of the proposed approach.
- Score: 30.839849072256435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable advances in image matching and pose estimation,
image-based localization of a camera in a temporally-varying outdoor
environment is still a challenging problem due to huge appearance disparity
between query and reference images caused by illumination, seasonal and
structural changes. In this work, we propose to leverage additional sensors on
a mobile phone, mainly GPS, compass, and gravity sensor, to solve this
challenging problem. We show that these mobile sensors provide decent initial
poses and effective constraints to reduce the searching space in image matching
and final pose estimation. With the initial pose, we are also able to devise a
direct 2D-3D matching network to efficiently establish 2D-3D correspondences
instead of tedious 2D-2D matching in existing systems. As no public dataset
exists for the studied problem, we collect a new dataset that provides a
variety of mobile sensor data and significant scene appearance variations, and
develop a system to acquire ground-truth poses for query images. We benchmark
our method as well as several state-of-the-art baselines and demonstrate the
effectiveness of the proposed approach. The code and dataset will be released
publicly.
Related papers
- HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning [1.4515751892711464]
We propose an end-to-end solution that addresses the 2D-3D correspondence problem.
This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module.
We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-22T17:59:01Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - SCENES: Subpixel Correspondence Estimation With Epipolar Supervision [18.648772607057175]
Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem.
Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets.
We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry.
arXiv Detail & Related papers (2024-01-19T18:57:46Z) - Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging
Conditions [29.608665442108727]
Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution.
The present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping.
The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results.
arXiv Detail & Related papers (2024-01-16T01:48:45Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Lidar-Monocular Surface Reconstruction Using Line Segments [5.542669744873386]
We propose to leverage common geometric features that are detected in both the LIDAR scans and image data, allowing data from the two sensors to be processed in a higher-level space.
We show that our method delivers results that are comparable to a state-of-the-art LIDAR survey while not requiring highly accurate ground truth pose estimates.
arXiv Detail & Related papers (2021-04-06T19:49:53Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.