Learning Cross-Scale Visual Representations for Real-Time Image
Geo-Localization
- URL: http://arxiv.org/abs/2109.04087v1
- Date: Thu, 9 Sep 2021 08:08:54 GMT
- Title: Learning Cross-Scale Visual Representations for Real-Time Image
Geo-Localization
- Authors: Tianyi Zhang and Matthew Johnson-Roberson
- Abstract summary: State estimation approaches based on local sensors are drifting-prone for long-range missions as error accumulates.
We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources.
We propose a framework that learns cross-scale visual representations without supervision.
- Score: 21.375640354558044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robot localization remains a challenging task in GPS denied environments.
State estimation approaches based on local sensors, e.g. cameras or IMUs, are
drifting-prone for long-range missions as error accumulates. In this study, we
aim to address this problem by localizing image observations in a 2D
multi-modal geospatial map. We introduce the cross-scale dataset and a
methodology to produce additional data from cross-modality sources. We propose
a framework that learns cross-scale visual representations without supervision.
Experiments are conducted on data from two different domains, underwater and
aerial. In contrast to existing studies in cross-view image geo-localization,
our approach a) performs better on smaller-scale multi-modal maps; b) is more
computationally efficient for real-time applications; c) can serve directly in
concert with state estimation pipelines.
Related papers
- OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances [11.085165252259042]
OSMLoc is a brain-inspired single-image visual localization method with semantic and geometric guidance to improve accuracy, robustness, and generalization ability.
To validate the proposed OSMLoc, we collect a worldwide cross-area and cross-condition (CC) benchmark for extensive evaluation.
arXiv Detail & Related papers (2024-11-13T14:59:00Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Image-based Geolocalization by Ground-to-2.5D Map Matching [21.21416396311102]
Methods often utilize cross-view localization techniques to match ground-view query images with 2D maps.
We propose a new approach to learning representative embeddings from multi-modal data.
By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps.
arXiv Detail & Related papers (2023-08-11T08:00:30Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Visual Cross-View Metric Localization with Dense Uncertainty Estimates [11.76638109321532]
This work addresses visual cross-view metric localization for outdoor robotics.
Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch.
We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck, and a dense spatial distribution as output to capture multi-modal localization ambiguities.
arXiv Detail & Related papers (2022-08-17T20:12:23Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.