Rethinking Visual Geo-localization for Large-Scale Applications
- URL: http://arxiv.org/abs/2204.02287v2
- Date: Thu, 7 Apr 2022 12:57:38 GMT
- Title: Rethinking Visual Geo-localization for Large-Scale Applications
- Authors: Gabriele Berton, Carlo Masone, Barbara Caputo
- Abstract summary: We build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases.
We design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem.
We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes.
- Score: 18.09618985653891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Geo-localization (VG) is the task of estimating the position where a
given photo was taken by comparing it with a large database of images of known
locations. To investigate how existing techniques would perform on a real-world
city-wide VG application, we build San Francisco eXtra Large, a new dataset
covering a whole city and providing a wide range of challenging cases, with a
size 30x bigger than the previous largest dataset for visual geo-localization.
We find that current methods fail to scale to such large datasets, therefore we
design a new highly scalable training technique, called CosPlace, which casts
the training as a classification problem avoiding the expensive mining needed
by the commonly used contrastive learning. We achieve state-of-the-art
performance on a wide range of datasets and find that CosPlace is robust to
heavy domain changes. Moreover, we show that, compared to the previous
state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time,
and it achieves better results with 8x smaller descriptors, paving the way for
city-wide real-world visual geo-localization. Dataset, code and trained models
are available for research purposes at https://github.com/gmberton/CosPlace.
Related papers
- CityGuessr: City-Level Video Geo-Localization on a Global Scale [54.371452373726584]
We propose a novel problem of worldwide video geolocalization with the objective of hierarchically predicting the correct city, state/province, country, and continent, given a video.
No large scale video datasets that have extensive worldwide coverage exist, to train models for solving this problem.
We introduce a new dataset, CityGuessr68k comprising of 68,269 videos from 166 cities all over the world.
arXiv Detail & Related papers (2024-11-10T03:20:00Z) - CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians [64.6687065215713]
CityGaussian employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering.
Our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes across vastly different scales.
arXiv Detail & Related papers (2024-04-01T14:24:40Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - PIGEON: Predicting Image Geolocations [44.99833362998488]
We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function.
PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places.
arXiv Detail & Related papers (2023-07-11T23:36:49Z) - GSV-Cities: Toward Appropriate Supervised Visual Place Recognition [3.6739949215165164]
We introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth.
We then explore the full potential of advances in deep metric learning to train networks specifically for place recognition.
We establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland.
arXiv Detail & Related papers (2022-10-19T01:39:29Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic
Data [2.554905387213586]
We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data.
To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool.
We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
arXiv Detail & Related papers (2021-12-16T18:05:48Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z) - Robust Image Retrieval-based Visual Localization using Kapture [10.249293519246478]
We present a versatile pipeline for visual localization that facilitates the use of different local and global features.
We evaluate our methods on eight public datasets where they rank top on all and first on many of them.
To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license.
arXiv Detail & Related papers (2020-07-27T21:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.