Where in the World is this Image? Transformer-based Geo-localization in
the Wild
- URL: http://arxiv.org/abs/2204.13861v1
- Date: Fri, 29 Apr 2022 03:27:23 GMT
- Title: Where in the World is this Image? Transformer-based Geo-localization in
the Wild
- Authors: Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo
and Rama Chellappa
- Abstract summary: Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem.
We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image.
We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
- Score: 48.69031054573838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting the geographic location (geo-localization) from a single
ground-level RGB image taken anywhere in the world is a very challenging
problem. The challenges include huge diversity of images due to different
environmental scenarios, drastic variation in the appearance of the same
location depending on the time of the day, weather, season, and more
importantly, the prediction is made from a single image possibly having only a
few geo-locating cues. For these reasons, most existing works are restricted to
specific cities, imagery, or worldwide landmarks. In this work, we focus on
developing an efficient solution to planet-scale single-image geo-localization.
To this end, we propose TransLocator, a unified dual-branch transformer network
that attends to tiny details over the entire image and produces robust feature
representation under extreme appearance variations. TransLocator takes an RGB
image and its semantic segmentation map as inputs, interacts between its two
parallel branches after each transformer layer, and simultaneously performs
geo-localization and scene recognition in a multi-task fashion. We evaluate
TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and
obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the
state-of-the-art. TransLocator is also validated on real-world test images and
found to be more effective than previous methods.
Related papers
- G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models [40.69217368870192]
We propose a novel framework for worldwide geolocalization based on Retrieval-Augmented Generation (RAG)
G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification.
Experiments on two well-established datasets verify the superiority of G3 compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T15:37:06Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - PIGEON: Predicting Image Geolocations [44.99833362998488]
We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function.
PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places.
arXiv Detail & Related papers (2023-07-11T23:36:49Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization [81.70547404891099]
CNN-based methods for cross-view image geo-localization fail to model global correlation.
We propose a pure transformer-based approach (TransGeo) to address these limitations.
TransGeo achieves state-of-the-art results on both urban and rural datasets.
arXiv Detail & Related papers (2022-03-31T21:19:41Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Hierarchical Attention Fusion for Geo-Localization [7.544917072241684]
We introduce a hierarchical attention fusion network using multi-scale features for geo-localization.
We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.
Our training is self-supervised using adaptive weights to control the attention of feature emphasis from each hierarchical level.
arXiv Detail & Related papers (2021-02-18T07:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.