Towards A Fairer Landmark Recognition Dataset
- URL: http://arxiv.org/abs/2108.08874v1
- Date: Thu, 19 Aug 2021 18:42:22 GMT
- Title: Towards A Fairer Landmark Recognition Dataset
- Authors: Zu Kim, Andr\'e Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green,
N'Mah Fodiatu Yilla, Tobias Weyand
- Abstract summary: We create a landmark recognition dataset with a focus on fair worldwide representation.
We start by defining the fair relevance of a landmark to the world population.
These relevances are estimated by combining anonymized Google Maps user contribution statistics with the contributors' demographic information.
- Score: 9.654500155170172
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a new landmark recognition dataset, which is created with a
focus on fair worldwide representation. While previous work proposes to collect
as many images as possible from web repositories, we instead argue that such
approaches can lead to biased data. To create a more comprehensive and
equitable dataset, we start by defining the fair relevance of a landmark to the
world population. These relevances are estimated by combining anonymized Google
Maps user contribution statistics with the contributors' demographic
information. We present a stratification approach and analysis which leads to a
much fairer coverage of the world, compared to existing datasets. The resulting
datasets are used to evaluate computer vision models as part of the the Google
Landmark Recognition and RetrievalChallenges 2021.
Related papers
- Weak-Annotation of HAR Datasets using Vision Foundation Models [9.948823510429902]
We propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator.
We show that using our approach, the annotation of centroid clips suffices to achieve average labelling accuracies close to 90% across three publicly available HAR benchmark datasets.
arXiv Detail & Related papers (2024-08-09T16:46:53Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Fake News Detection: It's All in the Data! [0.06749750044497731]
The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance.
GitHub repository consolidates publicly accessible datasets into a single, user-friendly portal.
arXiv Detail & Related papers (2024-07-02T10:12:06Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Benchmarking person re-identification datasets and approaches for
practical real-world implementations [1.0079626733116613]
Person Re-Identification (Re-ID) has received a lot of attention.
However, when such Re-ID models are deployed in new cities or environments, the task of searching for people within a network of security cameras is likely to face an important domain shift.
This paper introduces a complete methodology to evaluate Re-ID approaches and training datasets with respect to their suitability for unsupervised deployment for live operations.
arXiv Detail & Related papers (2022-12-20T03:45:38Z) - Improving Fairness in Large-Scale Object Recognition by CrowdSourced
Demographic Information [7.968124582214686]
Representing objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture.
We propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors.
We present analysis which leads to a much fairer coverage of the world compared to existing datasets.
arXiv Detail & Related papers (2022-06-02T22:55:10Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z) - TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain
Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition.
We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space.
Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.