AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces
- URL: http://arxiv.org/abs/2411.00956v1
- Date: Fri, 01 Nov 2024 18:11:29 GMT
- Title: AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces
- Authors: Shreeyash Gowaikar, Hugo Berard, Rashid Mushkani, Emmanuel Beaudry Marchand, Toumadher Ammar, Shin Koseki,
- Abstract summary: Crowdsourcing often employs low-wage workers with poor working conditions and lacks consideration for the representativeness of annotators.
We propose a methodology involving a co-design model that actively engages stakeholders at key stages, integrating principles of Equity, Diversity, and Inclusion (EDI) to ensure diverse viewpoints.
We apply this methodology to develop a dataset and AI model for evaluating public space quality using street view images.
- Score: 2.691611484444756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advancements in AI heavily rely on large-scale datasets meticulously curated and annotated for training. However, concerns persist regarding the transparency and context of data collection methodologies, especially when sourced through crowdsourcing platforms. Crowdsourcing often employs low-wage workers with poor working conditions and lacks consideration for the representativeness of annotators, leading to algorithms that fail to represent diverse views and perpetuate biases against certain groups. To address these limitations, we propose a methodology involving a co-design model that actively engages stakeholders at key stages, integrating principles of Equity, Diversity, and Inclusion (EDI) to ensure diverse viewpoints. We apply this methodology to develop a dataset and AI model for evaluating public space quality using street view images, demonstrating its effectiveness in capturing diverse perspectives and fostering higher-quality data.
Related papers
- OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value [74.80873109856563]
OpenDataArena (ODA) is a holistic and open platform designed to benchmark the intrinsic value of post-training data.<n>ODA establishes a comprehensive ecosystem comprising four key pillars: (i) a unified training-evaluation pipeline that ensures fair, open comparisons across diverse models; (ii) a multi-dimensional scoring framework that profiles data quality along tens of distinct axes; and (iii) an interactive data lineage explorer to visualize dataset genealogy and dissect component sources.
arXiv Detail & Related papers (2025-12-16T03:33:24Z) - Large Language Model Sourcing: A Survey [84.63438376832471]
Large language models (LLMs) have revolutionized artificial intelligence, shifting from supporting objective tasks to empowering subjective decision-making.<n>Due to the black-box nature of LLMs and the human-like quality of their generated content, issues such as hallucinations, bias, unfairness, and copyright infringement become significant.<n>This survey presents a systematic investigation into provenance tracking for content generated by LLMs, organized around four interrelated dimensions.
arXiv Detail & Related papers (2025-10-11T10:52:30Z) - VISTA: A Visual Analytics Framework to Enhance Foundation Model-Generated Data Labels [30.699079182148054]
We introduce VISTA, a visual analytics framework that improves data quality to enhance the performance of multi-modal models.<n>We show how VISTA integrates multi-phased data validation strategies with human expertise, enabling humans to identify, understand, and correct hidden issues within FM-generated labels.
arXiv Detail & Related papers (2025-07-11T20:17:23Z) - Behind the Screens: Uncovering Bias in AI-Driven Video Interview Assessments Using Counterfactuals [0.0]
We introduce a counterfactual-based framework to evaluate and quantify bias in AI-driven personality assessments.<n>Our approach employs generative adversarial networks (GANs) to generate counterfactual representations of job applicants.<n>This work provides a scalable tool for fairness auditing of commercial AI hiring platforms.
arXiv Detail & Related papers (2025-05-17T18:46:14Z) - Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management [2.334978724544296]
A major struggle for the development of fair AI models lies in the bias implicit in the data available to train such models.
We propose a method for visualizing the biases inherent in a dataset and understanding the potential trade-offs between fairness and accuracy.
arXiv Detail & Related papers (2024-11-25T22:14:02Z) - Discriminative Anchor Learning for Efficient Multi-view Clustering [59.11406089896875]
We propose discriminative anchor learning for multi-view clustering (DALMC)
We learn discriminative view-specific feature representations according to the original dataset.
We build anchors from different views based on these representations, which increase the quality of the shared anchor graph.
arXiv Detail & Related papers (2024-09-25T13:11:17Z) - Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness [6.683051393349788]
This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies.
We validate our approach using both proprietary and public datasets.
Our findings reveal significant discrepancies between minority ethnic groups, highlighting the need for targeted interventions in real-world AI applications.
arXiv Detail & Related papers (2024-05-24T08:10:31Z) - Bridging the Digital Divide: Performance Variation across Socio-Economic
Factors in Vision-Language Models [31.868468221653025]
We evaluate the performance of a vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values.
Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries.
arXiv Detail & Related papers (2023-11-09T21:10:52Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Visual Identification of Problematic Bias in Large Label Spaces [5.841861400363261]
Key challenge in scaling common fairness metrics to modern models and datasets is the requirement of exhaustive ground truth labeling.
domain experts need to be able to extract and reason about bias throughout models and datasets to make informed decisions.
We propose guidelines for designing visualizations for such large label spaces, considering both technical and ethical issues.
arXiv Detail & Related papers (2022-01-17T12:51:08Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - No computation without representation: Avoiding data and algorithm
biases through diversity [11.12971845021808]
We draw connections between the lack of diversity within academic and professional computing fields and the type and breadth of the biases encountered in datasets.
We use these lessons to develop recommendations that provide concrete steps for the computing community to increase diversity.
arXiv Detail & Related papers (2020-02-26T23:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.