Convolutional Cross-View Pose Estimation
- URL: http://arxiv.org/abs/2303.05915v3
- Date: Fri, 22 Dec 2023 09:00:03 GMT
- Title: Convolutional Cross-View Pose Estimation
- Authors: Zimin Xia, Olaf Booij, and Julian F. P. Kooij
- Abstract summary: We propose a novel end-to-end method for cross-view pose estimation.
Our method is validated on the VIGOR and KITTI datasets.
On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time.
- Score: 9.599356978682108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel end-to-end method for cross-view pose estimation. Given a
ground-level query image and an aerial image that covers the query's local
neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by
matching its image descriptor to descriptors of local regions within the aerial
image. The orientation-aware descriptors are obtained by using a
translationally equivariant convolutional ground image encoder and contrastive
learning. The Localization Decoder produces a dense probability distribution in
a coarse-to-fine manner with a novel Localization Matching Upsampling module. A
smaller Orientation Decoder produces a vector field to condition the
orientation estimate on the localization. Our method is validated on the VIGOR
and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and
36% in median localization error for comparable orientation estimation
accuracy. The predicted probability distribution can represent localization
ambiguity, and enables rejecting possible erroneous predictions. Without
re-training, the model can infer on ground images with different field of views
and utilize orientation priors if available. On the Oxford RobotCar dataset,
our method can reliably estimate the ego-vehicle's pose over time, achieving a
median localization error under 1 meter and a median orientation error of
around 1 degree at 14 FPS.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via
Geometry-Guided Cross-View Transformer [66.82008165644892]
We propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image.
Experimental results demonstrate that our method significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-07-16T11:52:27Z) - PNI : Industrial Anomaly Detection using Position and Neighborhood
Information [6.316693022958221]
We propose a new algorithm, textbfPNI, which estimates the normal distribution using conditional probability given neighborhood features.
We conducted experiments on the MVTec AD benchmark dataset and achieved state-of-the-art performance, with textbf99.56% and textbf98.98% AUROC scores in anomaly detection and localization.
arXiv Detail & Related papers (2022-11-22T23:45:27Z) - Uncertainty-aware Vision-based Metric Cross-view Geolocalization [25.87104194833264]
We present an end-to-end differentiable model that uses the ground and aerial images to predict a probability distribution over possible vehicle poses.
We improve the previous state-of-the-art by a large margin even without ground or aerial data from the test region.
arXiv Detail & Related papers (2022-11-22T10:23:20Z) - Visual Cross-View Metric Localization with Dense Uncertainty Estimates [11.76638109321532]
This work addresses visual cross-view metric localization for outdoor robotics.
Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch.
We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck, and a dense spatial distribution as output to capture multi-modal localization ambiguities.
arXiv Detail & Related papers (2022-08-17T20:12:23Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - Self-Supervised Learning of Image Scale and Orientation [35.94215211409985]
We study the problem of learning to assign a characteristic pose, i.e., scale and orientation, for an image region of interest.
It is hard to obtain a large-scale set of image regions with explicit pose annotations that a model directly learns from.
We propose a self-supervised learning framework with a histogram alignment technique.
arXiv Detail & Related papers (2022-06-15T02:43:39Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization.
Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization.
This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.