Improving Worst Case Visual Localization Coverage via Place-specific
Sub-selection in Multi-camera Systems
- URL: http://arxiv.org/abs/2206.13883v1
- Date: Tue, 28 Jun 2022 10:59:39 GMT
- Title: Improving Worst Case Visual Localization Coverage via Place-specific
Sub-selection in Multi-camera Systems
- Authors: Stephen Hausler, Ming Xu, Sourav Garg, Punarjay Chakravarty, Shubham
Shrivastava, Ankit Vora, Michael Milford
- Abstract summary: 6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map.
We demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines.
Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment.
- Score: 29.519262914510396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 6-DoF visual localization systems utilize principled approaches rooted in 3D
geometry to perform accurate camera pose estimation of images to a map. Current
techniques use hierarchical pipelines and learned 2D feature extractors to
improve scalability and increase performance. However, despite gains in typical
recall@0.25m type metrics, these systems still have limited utility for
real-world applications like autonomous vehicles because of their `worst' areas
of performance - the locations where they provide insufficient recall at a
certain required error tolerance. Here we investigate the utility of using
`place specific configurations', where a map is segmented into a number of
places, each with its own configuration for modulating the pose estimation
step, in this case selecting a camera within a multi-camera system. On the Ford
AV benchmark dataset, we demonstrate substantially improved worst-case
localization performance compared to using off-the-shelf pipelines - minimizing
the percentage of the dataset which has low recall at a certain error
tolerance, as well as improved overall localization performance. Our proposed
approach is particularly applicable to the crowdsharing model of autonomous
vehicle deployment, where a fleet of AVs are regularly traversing a known
route.
Related papers
- Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - LoLep: Single-View View Synthesis with Locally-Learned Planes and
Self-Attention Occlusion Inference [66.45326873274908]
We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately.
Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%.
arXiv Detail & Related papers (2023-07-23T03:38:55Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - A comparison of uncertainty estimation approaches for DNN-based camera
localization [6.053739577423792]
This work compares the performances of three uncertainty estimation methods: Monte Carlo Dropout (MCD), Deep Ensemble (DE), and Deep Evidential Regression (DER)
We achieve accurate camera localization and a calibrated uncertainty, to the point that some method can be used for detecting localization failures.
arXiv Detail & Related papers (2022-11-02T16:15:28Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Coarse-to-fine Semantic Localization with HD Map for Autonomous Driving
in Structural Scenes [1.1024591739346292]
We propose a cost-effective vehicle localization system with HD map for autonomous driving using cameras as primary sensors.
We formulate vision-based localization as a data association problem that maps visual semantics to landmarks in HD map.
We evaluate our method on two datasets and demonstrate that the proposed approach yields promising localization results in different driving scenarios.
arXiv Detail & Related papers (2021-07-06T11:58:55Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images [85.81073893916414]
We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
arXiv Detail & Related papers (2020-08-27T14:46:22Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.