GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization
- URL: http://arxiv.org/abs/2309.16020v2
- Date: Tue, 21 Nov 2023 23:23:08 GMT
- Title: GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization
- Authors: Vicente Vivanco Cepeda, Gaurav Kumar Nayak, Mubarak Shah
- Abstract summary: Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
- Score: 61.10806364001535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Worldwide Geo-localization aims to pinpoint the precise location of images
taken anywhere on Earth. This task has considerable challenges due to immense
variation in geographic landscapes. The image-to-image retrieval-based
approaches fail to solve this problem on a global scale as it is not feasible
to construct a large gallery of images covering the entire world. Instead,
existing approaches divide the globe into discrete geographic cells,
transforming the problem into a classification task. However, their performance
is limited by the predefined classes and often results in inaccurate
localizations when an image's location significantly deviates from its class
center. To overcome these limitations, we propose GeoCLIP, a novel
CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between
the image and its corresponding GPS locations. GeoCLIP's location encoder
models the Earth as a continuous function by employing positional encoding
through random Fourier features and constructing a hierarchical representation
that captures information at varying resolutions to yield a semantically rich
high-dimensional feature suitable to use even beyond geo-localization. To the
best of our knowledge, this is the first work employing GPS encoding for
geo-localization. We demonstrate the efficacy of our method via extensive
experiments and ablations on benchmark datasets. We achieve competitive
performance with just 20% of training data, highlighting its effectiveness even
in limited-data settings. Furthermore, we qualitatively demonstrate
geo-localization using a text query by leveraging CLIP backbone of our image
encoder. The project webpage is available at:
https://vicentevivan.github.io/GeoCLIP
Related papers
- Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models [40.69217368870192]
We propose a novel framework for worldwide geolocalization based on Retrieval-Augmented Generation (RAG)
G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification.
Experiments on two well-established datasets verify the superiority of G3 compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T15:37:06Z) - PIGEON: Predicting Image Geolocations [44.99833362998488]
We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function.
PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places.
arXiv Detail & Related papers (2023-07-11T23:36:49Z) - G^3: Geolocation via Guidebook Grounding [92.46774241823562]
We study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation.
We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations.
Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy.
arXiv Detail & Related papers (2022-11-28T16:34:40Z) - A Gis Aided Approach for Geolocalizing an Unmanned Aerial System Using
Deep Learning [0.4297070083645048]
We propose an alternative approach to geolocalize a UAS when GPS signal is degraded or denied.
Considering UAS has a downward-looking camera on its platform that can acquire real-time images as the platform flies, we apply modern deep learning techniques to achieve geolocalization.
We extract GIS information from OpenStreetMap (OSM) to semantically segment matched features into building and terrain classes.
arXiv Detail & Related papers (2022-08-25T17:51:15Z) - Where in the World is this Image? Transformer-based Geo-localization in
the Wild [48.69031054573838]
Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem.
We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image.
We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
arXiv Detail & Related papers (2022-04-29T03:27:23Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Visual and Object Geo-localization: A Comprehensive Survey [11.120155713865918]
Geo-localization refers to the process of determining where on earth some entity' is located.
This paper provides a comprehensive survey of geo-localization involving images, which involves either determining from where an image has been captured (Image geo-localization) or geo-locating objects within an image (Object geo-localization)
We will provide an in-depth study, including a summary of popular algorithms, a description of proposed datasets, and an analysis of performance results to illustrate the current state of each field.
arXiv Detail & Related papers (2021-12-30T20:46:53Z) - Hierarchical Attention Fusion for Geo-Localization [7.544917072241684]
We introduce a hierarchical attention fusion network using multi-scale features for geo-localization.
We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.
Our training is self-supervised using adaptive weights to control the attention of feature emphasis from each hierarchical level.
arXiv Detail & Related papers (2021-02-18T07:07:03Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.