Related papers: VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization

VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization

URL: http://arxiv.org/abs/2308.00090v3
Date: Thu, 21 Nov 2024 16:21:07 GMT
Title: VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization
Authors: Jiuhong Xiao, Gao Zhu, Giuseppe Loianno,
Abstract summary: This study presents a novel VG-SSL framework, designed for versatile integration and benchmarking of diverse SSL methods for representation learning in VG. We adapt SSL techniques to improve VG on datasets from hand-held and car-mounted cameras used in robotics and autonomous vehicles. Results show that contrastive learning and information methods yield superior geo-specific representation quality, matching or surpassing the performance of state-of-the-art VG techniques.
Score: 7.689824252319191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Geo-localization (VG) is a critical research area for identifying geo-locations from visual inputs, particularly in autonomous navigation for robotics and vehicles. Current VG methods often learn feature extractors from geo-labeled images to create dense, geographically relevant representations. Recent advances in Self-Supervised Learning (SSL) have demonstrated its capability to achieve performance on par with supervised techniques with unlabeled images. This study presents a novel VG-SSL framework, designed for versatile integration and benchmarking of diverse SSL methods for representation learning in VG, featuring a unique geo-related pair strategy, GeoPair. Through extensive performance analysis, we adapt SSL techniques to improve VG on datasets from hand-held and car-mounted cameras used in robotics and autonomous vehicles. Our results show that contrastive learning and information maximization methods yield superior geo-specific representation quality, matching or surpassing the performance of state-of-the-art VG techniques. To our knowledge, This is the first benchmarking study of SSL in VG, highlighting its potential in enhancing geo-specific visual representations for robotics and autonomous vehicles. The code is publicly available at https://github.com/arplaboratory/VG-SSL.

Related papers

Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery [10.20881592893918]
Self-supervised learning (SSL) has become a powerful paradigm for learning from large, unlabeled datasets.<n>Applying SSL to multispectral remote sensing (RS) images presents unique challenges and opportunities due to the geographical and temporal variability of the data.<n>We introduce GeoRank, a novel regularization method for contrastive SSL that improves upon prior techniques.
arXiv Detail & Related papers (2026-01-05T17:24:50Z)
GLEAM: Learning to Match and Explain in Cross-View Geo-Localization [66.11208984986813]
Cross-View Geo-Localization (CVGL) focuses on identifying correspondences between images captured from distinct perspectives of the same geographical location.<n>We present GLEAM-C, a foundational CVGL model that unifies multiple views and modalities-including UAV imagery, street maps, panoramic views, and ground photographs-by aligning them exclusively with satellite imagery.<n>To address the lack of interpretability in traditional CVGL methods, we propose GLEAM-X, which combines cross-view correspondence prediction with explainable reasoning.
arXiv Detail & Related papers (2025-09-09T07:14:31Z)
VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization [24.433604332415204]
We propose a novel hybrid geo-localization framework that combines the strengths of vision-language models and visual place recognition.<n>We evaluate our approach on multiple geo-localization benchmarks and show that it consistently outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2025-07-23T12:23:03Z)
GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations [5.439918212297177]
Key challenge in multimodal GeoFM development is to explicitly model geospatial relationships across modalities. We propose GAIR, a novel multimodal GeoFM architecture integrating overhead RS data, street view imagery, and their geolocation metadata. We evaluate GAIR across 10 geospatial tasks spanning RS image-based, SV image-based, and location embedding-based benchmarks.
arXiv Detail & Related papers (2025-03-20T19:59:39Z)
Self-supervised Learning for Geospatial AI: A Survey [21.504978593542354]
Self-supervised learning (SSL) has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data.
arXiv Detail & Related papers (2024-08-22T05:28:22Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
Image-Based Geolocation Using Large Vision-Language Models [19.071551941682063]
We introduce tool, an innovative framework that significantly enhances image-based geolocation accuracy. tool employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies. It achieves an impressive average score of 4550.5 in the GeoGuessr game, with an 85.37% win rate, and delivers highly precise geolocation predictions.
arXiv Detail & Related papers (2024-08-18T13:39:43Z)
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images. CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z)
Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data. We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z)
Co-visual pattern augmented generative transformer learning for automobile geo-localization [12.449657263683337]
Cross-view geo-localization (CVGL) aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial images. We present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL) for CVGL.
arXiv Detail & Related papers (2022-03-17T07:29:02Z)
Graph-based Semi-supervised Learning: A Comprehensive Review [51.26862262550445]
Semi-supervised learning (SSL) has tremendous value in practice due to its ability to utilize both labeled data and unlabelled data. An important class of SSL methods is to naturally represent data as graphs, which corresponds to graph-based semi-supervised learning (GSSL) methods. GSSL methods have demonstrated their advantages in various domains due to their uniqueness of structure, the universality of applications, and their scalability to large scale data.
arXiv Detail & Related papers (2021-02-26T05:11:09Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
Multi-Level Graph Convolutional Network with Automatic Graph Learning for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification. By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions. Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.