Related papers: Interpretable Semantic Photo Geolocalization

Interpretable Semantic Photo Geolocalization

URL: http://arxiv.org/abs/2104.14995v1
Date: Fri, 30 Apr 2021 13:28:18 GMT
Title: Interpretable Semantic Photo Geolocalization
Authors: Jonas Theiner, Eric M\"uller-Budack, Ralph Ewerth
Abstract summary: We present two contributions in order to improve the interpretability of a geolocalization model. We propose a novel, semantic partitioning method which intuitively leads to an improved understanding of the predictions. We also introduce a novel metric to assess the importance of semantic visual concepts for a certain prediction.
Score: 4.286838964398275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Planet-scale photo geolocalization is the complex task of estimating the location depicted in an image solely based on its visual content. Due to the success of convolutional neural networks (CNNs), current approaches achieve super-human performance. However, previous work has exclusively focused on optimizing geolocalization accuracy. Moreover, due to the black-box property of deep learning systems, their predictions are difficult to validate for humans. State-of-the-art methods treat the task as a classification problem, where the choice of the classes, that is the partitioning of the world map, is the key for success. In this paper, we present two contributions in order to improve the interpretability of a geolocalization model: (1) We propose a novel, semantic partitioning method which intuitively leads to an improved understanding of the predictions, while at the same time state-of-the-art results are achieved for geolocational accuracy on benchmark test sets; (2) We introduce a novel metric to assess the importance of semantic visual concepts for a certain prediction to provide additional interpretable information, which allows for a large-scale analysis of already trained models.

Related papers

EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation. We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities. experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction [14.396966854171273]
We consider the problem of single-source domain generalization. Existing methods typically rely on extensive augmentations to synthetically cover diverse domains during training. We propose an approach that compels models to leverage such local concepts during prediction.
arXiv Detail & Related papers (2024-11-25T08:46:37Z)
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z)
Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection. Our approach performs contrastive learning by directly sampling individual point pairs from different regions. Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z)
Learning Semantics for Visual Place Recognition through Multi-Scale Attention [14.738954189759156]
We present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods.
arXiv Detail & Related papers (2022-01-24T14:13:12Z)
Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation [15.633461635276337]
We propose a mixed classification-retrieval scheme for global-scale image geolocation. Our approach demonstrates very competitive performance on four public datasets.
arXiv Detail & Related papers (2021-05-17T07:18:43Z)
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z)
Variational Structured Attention Networks for Deep Visual Representation Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework. We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z)
Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity [0.0]
We present a new semantic segmentation model for historical city maps based on convolutional neural networks. We show that these networks are able to semantically segment map data of a very large figurative diversity with efficiency.
arXiv Detail & Related papers (2021-01-29T09:08:12Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
Region Comparison Network for Interpretable Few-shot Image Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes. We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works. We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.