Visual Geo-localization with Self-supervised Representation Learning
- URL: http://arxiv.org/abs/2308.00090v2
- Date: Fri, 15 Sep 2023 20:48:12 GMT
- Title: Visual Geo-localization with Self-supervised Representation Learning
- Authors: Jiuhong Xiao, Gao Zhu and Giuseppe Loianno
- Abstract summary: We present a novel unified VG-SSL framework with the goal to enhance performance and training efficiency on a large Visual Geo-localization dataset.
Our work incorporates multiple SSL methods tailored for VG: SimCLR, MoCov2, BYOL, SimSiam, Barlow Twins, and VICReg.
- Score: 8.642591824865892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Geo-localization (VG) has emerged as a significant research area,
aiming to identify geolocation based on visual features. Most VG approaches use
learnable feature extractors for representation learning. Recently,
Self-Supervised Learning (SSL) methods have also demonstrated comparable
performance to supervised methods by using numerous unlabeled images for
representation learning. In this work, we present a novel unified VG-SSL
framework with the goal to enhance performance and training efficiency on a
large VG dataset by SSL methods. Our work incorporates multiple SSL methods
tailored for VG: SimCLR, MoCov2, BYOL, SimSiam, Barlow Twins, and VICReg. We
systematically analyze the performance of different training strategies and
study the optimal parameter settings for the adaptation of SSL methods for the
VG task. The results demonstrate that our method, without the significant
computation and memory usage associated with Hard Negative Mining (HNM), can
match or even surpass the VG performance of the baseline that employs HNM. The
code is available at https://github.com/arplaboratory/VG_SSL.
Related papers
- Self-supervised Learning for Geospatial AI: A Survey [21.504978593542354]
Self-supervised learning (SSL) has attracted increasing attention for its adoption in geospatial data.
This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data.
arXiv Detail & Related papers (2024-08-22T05:28:22Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - Image-Based Geolocation Using Large Vision-Language Models [19.071551941682063]
We introduce tool, an innovative framework that significantly enhances image-based geolocation accuracy.
tool employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies.
It achieves an impressive average score of 4550.5 in the GeoGuessr game, with an 85.37% win rate, and delivers highly precise geolocation predictions.
arXiv Detail & Related papers (2024-08-18T13:39:43Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Co-visual pattern augmented generative transformer learning for
automobile geo-localization [12.449657263683337]
Cross-view geo-localization (CVGL) aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial images.
We present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL) for CVGL.
arXiv Detail & Related papers (2022-03-17T07:29:02Z) - Graph-based Semi-supervised Learning: A Comprehensive Review [51.26862262550445]
Semi-supervised learning (SSL) has tremendous value in practice due to its ability to utilize both labeled data and unlabelled data.
An important class of SSL methods is to naturally represent data as graphs, which corresponds to graph-based semi-supervised learning (GSSL) methods.
GSSL methods have demonstrated their advantages in various domains due to their uniqueness of structure, the universality of applications, and their scalability to large scale data.
arXiv Detail & Related papers (2021-02-26T05:11:09Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.