GSV-Cities: Toward Appropriate Supervised Visual Place Recognition
- URL: http://arxiv.org/abs/2210.10239v1
- Date: Wed, 19 Oct 2022 01:39:29 GMT
- Title: GSV-Cities: Toward Appropriate Supervised Visual Place Recognition
- Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Gigu\`ere
- Abstract summary: We introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth.
We then explore the full potential of advances in deep metric learning to train networks specifically for place recognition.
We establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland.
- Score: 3.6739949215165164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to investigate representation learning for large scale visual
place recognition, which consists of determining the location depicted in a
query image by referring to a database of reference images. This is a
challenging task due to the large-scale environmental changes that can occur
over time (i.e., weather, illumination, season, traffic, occlusion). Progress
is currently challenged by the lack of large databases with accurate ground
truth. To address this challenge, we introduce GSV-Cities, a new image dataset
providing the widest geographic coverage to date with highly accurate ground
truth, covering more than 40 cities across all continents over a 14-year
period. We subsequently explore the full potential of recent advances in deep
metric learning to train networks specifically for place recognition, and
evaluate how different loss functions influence performance. In addition, we
show that performance of existing methods substantially improves when trained
on GSV-Cities. Finally, we introduce a new fully convolutional aggregation
layer that outperforms existing techniques, including GeM, NetVLAD and
CosPlace, and establish a new state-of-the-art on large-scale benchmarks, such
as Pittsburgh, Mapillary-SLS, SPED and Nordland. The dataset and code are
available for research purposes at https://github.com/amaralibey/gsv-cities.
Related papers
- GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model [6.135404769437841]
This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM)
Existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference.
To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable.
To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities.
arXiv Detail & Related papers (2024-06-03T18:08:56Z) - Forest Inspection Dataset for Aerial Semantic Segmentation and Depth
Estimation [6.635604919499181]
We introduce a new large aerial dataset for forest inspection.
It contains both real-world and virtual recordings of natural environments.
We develop a framework to assess the deforestation degree of an area.
arXiv Detail & Related papers (2024-03-11T11:26:44Z) - SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery [22.716322265391852]
We introduce Satellite Contrastive Location-Image Pretraining (SatCLIP)
SatCLIP learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates.
In experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature prediction, animal recognition, and population density estimation.
arXiv Detail & Related papers (2023-11-28T19:14:40Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions.
First, we introduce a large-scale dataset GeoNet for geographic adaptation.
Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context.
Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z) - 4Seasons: Benchmarking Visual SLAM and Long-Term Localization for
Autonomous Driving in Challenging Conditions [54.59279160621111]
We present a novel visual SLAM and long-term localization benchmark for autonomous driving in challenging conditions based on the large-scale 4Seasons dataset.
The proposed benchmark provides drastic appearance variations caused by seasonal changes and diverse weather and illumination conditions.
We introduce a new unified benchmark for jointly evaluating visual odometry, global place recognition, and map-based visual localization performance.
arXiv Detail & Related papers (2022-12-31T13:52:36Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Rethinking Visual Geo-localization for Large-Scale Applications [18.09618985653891]
We build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases.
We design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem.
We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes.
arXiv Detail & Related papers (2022-04-05T15:33:45Z) - SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point
Clouds [52.624157840253204]
We introduce SensatUrban, an urban-scale UAV photogrammetry point cloud dataset consisting of nearly three billion points collected from three UK cities, covering 7.6 km2.
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
arXiv Detail & Related papers (2022-01-12T14:48:11Z) - Improving Deep Stereo Network Generalization with Geometric Priors [93.09496073476275]
Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
arXiv Detail & Related papers (2020-08-25T15:24:02Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.