Related papers: WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?

WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?

URL: http://arxiv.org/abs/2305.06415v1
Date: Wed, 10 May 2023 18:52:09 GMT
Title: WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?
Authors: Ali Akbar Septiandri, Marios Constantinides, Mohammad Tahaei, Daniele Quercia
Abstract summary: Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population. This study aims to quantify the extent to which the ACM FAccT conference relies on WEIRD samples.
Score: 8.12219922021227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population and may not accurately represent human behavior. In this study, we aim to quantify the extent to which the ACM FAccT conference, the leading venue in exploring Artificial Intelligence (AI) systems' fairness, accountability, and transparency, relies on WEIRD samples. We collected and analyzed 128 papers published between 2018 and 2022, accounting for 30.8% of the overall proceedings published at FAccT in those years (excluding abstracts, tutorials, and papers without human-subject studies or clear country attribution for the participants). We found that 84% of the analyzed papers were exclusively based on participants from Western countries, particularly exclusively from the U.S. (63%). Only researchers who undertook the effort to collect data about local participants through interviews or surveys added diversity to an otherwise U.S.-centric view of science. Therefore, we suggest that researchers collect data from under-represented populations to obtain an inclusive worldview. To achieve this goal, scientific communities should champion data collection from such populations and enforce transparent reporting of data biases.

Related papers

Transforming Social Science Research with Transfer Learning: Social Science Survey Data Integration with AI [0.4944564023471818]
Large-N nationally representative surveys, which have profoundly shaped American politics scholarship, represent related but distinct domains. Our study introduces a novel application of transfer learning (TL) to address these gaps. Models pre-trained on the Cooperative Election Study dataset are fine-tuned for use in the American National Election Studies dataset.
arXiv Detail & Related papers (2025-01-11T16:01:44Z)
Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z)
The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We propose a taxonomy of research contributions and introduce NLPContributions, a dataset of nearly $2k$ NLP research paper abstracts.<n>We show that NLP research has taken a winding path -- with the focus on language and human-centric studies being prominent in the 1970s and 80s, tapering off in the 1990s and 2000s, and starting to rise again since the late 2010s.<n>Our dataset and analyses offer a powerful lens for tracing research trends and offer potential for generating informed, data-driven literature surveys.
arXiv Detail & Related papers (2024-09-29T01:29:28Z)
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections [42.72938925647165]
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans. "Synthetic samples" might exhibit bias due to training data and fine-tuning processes being unrepresentative of diverse contexts. This study investigates if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction.
arXiv Detail & Related papers (2024-08-29T16:01:06Z)
Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z)
WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research? [3.0829845709781725]
We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference. We found that 37% of these papers focused solely on data from Western countries. The studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich.
arXiv Detail & Related papers (2024-06-04T08:17:47Z)
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models [67.38144169029617]
We introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile.
arXiv Detail & Related papers (2024-04-24T17:51:36Z)
Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research. We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023. Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z)
Challenges in Annotating Datasets to Quantify Bias in Under-represented Society [7.9342597513806865]
Benchmark bias datasets have been developed for binary gender classification and ethical/racial considerations. Motivated by the lack of annotated datasets for quantifying bias in under-represented societies, we created benchmark datasets for the New Zealand (NZ) population. This research outlines the manual annotation process, provides an overview of the challenges we encountered and lessons learnt, and presents recommendations for future research.
arXiv Detail & Related papers (2023-09-11T22:24:39Z)
Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960-2021 [73.06361680847708]
In 1960 14% of 333 research fields were related to AI, but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times. In 1960 14% of 333 research fields were related to AI (many in computer science), but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times. We conclude that the context of the current surge appears different, and that interdisciplinary AI application is likely to be sustained.
arXiv Detail & Related papers (2023-06-15T14:08:07Z)
The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices [2.28438857884398]
This study explores how, and to what extent, comparable research ethics requirements and norms have developed for AI research and data enrichment. Leading AI venues have begun to establish protocols for human data collection, but these are are inconsistently followed by authors.
arXiv Detail & Related papers (2023-06-01T16:12:55Z)
How WEIRD is Usable Privacy and Security Research? (Extended Version) [7.669758543344074]
We conducted a literature review to understand the extent to which participant samples in UPS papers were from WEIRD countries. Geographic and linguistic barriers in the study methods and recruitment methods may cause researchers to conduct user studies locally.
arXiv Detail & Related papers (2023-05-08T19:21:18Z)
Biomedical image analysis competitions: The state of current participation practice [143.52578599912326]
We designed a survey to shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis. The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures.
arXiv Detail & Related papers (2022-12-16T16:44:46Z)
Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.