Which country is this picture from? New data and methods for DNN-based
country recognition
- URL: http://arxiv.org/abs/2209.02429v1
- Date: Fri, 2 Sep 2022 10:56:41 GMT
- Title: Which country is this picture from? New data and methods for DNN-based
country recognition
- Authors: Omran Alamayreh, Giovanna Maria Dimitri, Jun Wang, Benedetta Tondi,
Mauro Barni
- Abstract summary: Previous works have focused mostly on the estimation of the geo-coordinates where a picture has been taken.
We introduce a new dataset, the VIPPGeo dataset, containing almost 4 million images.
We use the dataset to train a deep learning architecture casting the country recognition problem as a classification problem.
- Score: 33.73817899937691
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Predicting the country where a picture has been taken from has many potential
applications, like detection of false claims, impostors identification,
prevention of disinformation campaigns, identification of fake news and so on.
Previous works have focused mostly on the estimation of the geo-coordinates
where a picture has been taken. Yet, recognizing the country where an image has
been taken could potentially be more important, from a semantic and forensic
point of view, than identifying its spatial coordinates. So far only a few
works have addressed this task, mostly by relying on images containing
characteristic landmarks, like iconic monuments. In the above framework, this
paper provides two main contributions. First, we introduce a new dataset, the
VIPPGeo dataset, containing almost 4 million images, that can be used to train
DL models for country classification. The dataset contains only urban images
given the relevance of this kind of image for country recognition, and it has
been built by paying attention to removing non-significant images, like images
portraying faces or specific, non-relevant objects, like airplanes or ships.
Secondly, we used the dataset to train a deep learning architecture casting the
country recognition problem as a classification problem. The experiments, we
performed, show that our network provides significantly better results than
current state of the art. In particular, we found that asking the network to
directly identify the country provides better results than estimating the
geo-coordinates first and then using them to trace back to the country where
the picture was taken.
Related papers
- AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics.
We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - PIGEON: Predicting Image Geolocations [44.99833362998488]
We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function.
PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places.
arXiv Detail & Related papers (2023-07-11T23:36:49Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - 3DoF Localization from a Single Image and an Object Map: the Flatlandia
Problem and Dataset [20.986848597435728]
We propose Flatlandia, a novel visual localization challenge.
We investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map.
For each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods.
arXiv Detail & Related papers (2023-04-13T09:53:09Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z) - Google Landmarks Dataset v2 -- A Large-Scale Benchmark for
Instance-Level Recognition and Retrieval [9.922132565411664]
We introduce the Google Landmarks dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval.
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos.
arXiv Detail & Related papers (2020-04-03T22:52:17Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - Automatic Signboard Detection and Localization in Densely Populated
Developing Cities [0.0]
Signboard detection in natural scene images is the foremost task for error-free information retrieval.
We present a novel object detection approach that can detect signboards automatically and is suitable for such cities.
Our proposed method can detect signboards accurately (even if the images contain multiple signboards with diverse shapes and colours in a noisy background) achieving 0.90 mAP (mean average precision)
arXiv Detail & Related papers (2020-03-04T08:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.