Interpretable Semantic Photo Geolocalization
- URL: http://arxiv.org/abs/2104.14995v1
- Date: Fri, 30 Apr 2021 13:28:18 GMT
- Title: Interpretable Semantic Photo Geolocalization
- Authors: Jonas Theiner, Eric M\"uller-Budack, Ralph Ewerth
- Abstract summary: We present two contributions in order to improve the interpretability of a geolocalization model.
We propose a novel, semantic partitioning method which intuitively leads to an improved understanding of the predictions.
We also introduce a novel metric to assess the importance of semantic visual concepts for a certain prediction.
- Score: 4.286838964398275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planet-scale photo geolocalization is the complex task of estimating the
location depicted in an image solely based on its visual content. Due to the
success of convolutional neural networks (CNNs), current approaches achieve
super-human performance. However, previous work has exclusively focused on
optimizing geolocalization accuracy. Moreover, due to the black-box property of
deep learning systems, their predictions are difficult to validate for humans.
State-of-the-art methods treat the task as a classification problem, where the
choice of the classes, that is the partitioning of the world map, is the key
for success. In this paper, we present two contributions in order to improve
the interpretability of a geolocalization model: (1) We propose a novel,
semantic partitioning method which intuitively leads to an improved
understanding of the predictions, while at the same time state-of-the-art
results are achieved for geolocational accuracy on benchmark test sets; (2) We
introduce a novel metric to assess the importance of semantic visual concepts
for a certain prediction to provide additional interpretable information, which
allows for a large-scale analysis of already trained models.
Related papers
- Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map [23.71680014689873]
Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location.
This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM)
Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly.
arXiv Detail & Related papers (2023-05-27T14:33:01Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z) - Learning Semantics for Visual Place Recognition through Multi-Scale
Attention [14.738954189759156]
We present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data.
Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods.
arXiv Detail & Related papers (2022-01-24T14:13:12Z) - Leveraging EfficientNet and Contrastive Learning for Accurate
Global-scale Location Estimation [15.633461635276337]
We propose a mixed classification-retrieval scheme for global-scale image geolocation.
Our approach demonstrates very competitive performance on four public datasets.
arXiv Detail & Related papers (2021-05-17T07:18:43Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Neural networks for semantic segmentation of historical city maps:
Cross-cultural performance and the impact of figurative diversity [0.0]
We present a new semantic segmentation model for historical city maps based on convolutional neural networks.
We show that these networks are able to semantically segment map data of a very large figurative diversity with efficiency.
arXiv Detail & Related papers (2021-01-29T09:08:12Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.