RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
- URL: http://arxiv.org/abs/2502.19781v2
- Date: Thu, 03 Apr 2025 20:20:24 GMT
- Title: RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
- Authors: Aayush Dhakal, Srikumar Sastry, Subash Khanal, Adeel Ahmad, Eric Xing, Nathan Jacobs,
- Abstract summary: We propose a novel retrieval-augmented strategy called RANGE.<n>We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations.<n>Our results show that RANGE outperforms the existing state-of-the-art models with significant margins in most tasks.
- Score: 7.431269929582643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works like SatCLIP and GeoCLIP learn such representations by contrastively aligning geolocation with co-located images. While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information-theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We evaluate our method across a wide variety of tasks. Our results show that RANGE outperforms the existing state-of-the-art models with significant margins in most tasks. We show gains of up to 13.1% on classification tasks and 0.145 $R^2$ on regression tasks. All our code and models will be made available at: https://github.com/mvrl/RANGE.
Related papers
- Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - GSV-Cities: Toward Appropriate Supervised Visual Place Recognition [3.6739949215165164]
We introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth.
We then explore the full potential of advances in deep metric learning to train networks specifically for place recognition.
We establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland.
arXiv Detail & Related papers (2022-10-19T01:39:29Z) - GPS: A Policy-driven Sampling Approach for Graph Representation Learning [12.760239169374984]
We propose an adaptive Graph Policy-driven Sampling model (GPS), where the influence of each node in the local neighborhood is realized through the adaptive correlation calculation.
Our proposed model outperforms the existing ones by 3%-8% on several vital benchmarks, achieving state-of-the-art performance in real-world datasets.
arXiv Detail & Related papers (2021-12-29T09:59:53Z) - Leveraging EfficientNet and Contrastive Learning for Accurate
Global-scale Location Estimation [15.633461635276337]
We propose a mixed classification-retrieval scheme for global-scale image geolocation.
Our approach demonstrates very competitive performance on four public datasets.
arXiv Detail & Related papers (2021-05-17T07:18:43Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic
Question Answering and Spatial Semantic Lifting [9.949690056661218]
We propose a location-aware KG embedding model called SE-KGE.
It encodes spatial information such as point coordinates or bounding boxes of geographic entities into the KG embedding space.
We also construct a geographic knowledge graph as well as a set of geographic query-answer pairs called DBGeo to evaluate the performance of SE-KGE.
arXiv Detail & Related papers (2020-04-25T17:46:31Z) - A Transfer Learning approach to Heatmap Regression for Action Unit
intensity estimation [50.261472059743845]
Action Units (AUs) are geometrically-based atomic facial muscle movements.
We propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity.
A Heatmap models whether an AU occurs or not at a given spatial location.
arXiv Detail & Related papers (2020-04-14T16:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.