WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and
Democratic is FAccT?
- URL: http://arxiv.org/abs/2305.06415v1
- Date: Wed, 10 May 2023 18:52:09 GMT
- Title: WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and
Democratic is FAccT?
- Authors: Ali Akbar Septiandri, Marios Constantinides, Mohammad Tahaei, Daniele
Quercia
- Abstract summary: Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population.
This study aims to quantify the extent to which the ACM FAccT conference relies on WEIRD samples.
- Score: 8.12219922021227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Studies conducted on Western, Educated, Industrialized, Rich, and Democratic
(WEIRD) samples are considered atypical of the world's population and may not
accurately represent human behavior. In this study, we aim to quantify the
extent to which the ACM FAccT conference, the leading venue in exploring
Artificial Intelligence (AI) systems' fairness, accountability, and
transparency, relies on WEIRD samples. We collected and analyzed 128 papers
published between 2018 and 2022, accounting for 30.8% of the overall
proceedings published at FAccT in those years (excluding abstracts, tutorials,
and papers without human-subject studies or clear country attribution for the
participants). We found that 84% of the analyzed papers were exclusively based
on participants from Western countries, particularly exclusively from the U.S.
(63%). Only researchers who undertook the effort to collect data about local
participants through interviews or surveys added diversity to an otherwise
U.S.-centric view of science. Therefore, we suggest that researchers collect
data from under-represented populations to obtain an inclusive worldview. To
achieve this goal, scientific communities should champion data collection from
such populations and enforce transparent reporting of data biases.
Related papers
- Fairness in LLM-Generated Surveys [0.5720786928479238]
Large Language Models (LLMs) excel in text generation and understanding, especially simulating socio-political and economic patterns.
This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States.
Political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles.
arXiv Detail & Related papers (2025-01-25T23:42:20Z) - Transforming Social Science Research with Transfer Learning: Social Science Survey Data Integration with AI [0.4944564023471818]
Large-N nationally representative surveys, which have profoundly shaped American politics scholarship, represent related but distinct domains.
Our study introduces a novel application of transfer learning (TL) to address these gaps.
Models pre-trained on the Cooperative Election Study dataset are fine-tuned for use in the American National Election Studies dataset.
arXiv Detail & Related papers (2025-01-11T16:01:44Z) - Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.
Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.
We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research? [3.0829845709781725]
We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference.
We found that 37% of these papers focused solely on data from Western countries.
The studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich.
arXiv Detail & Related papers (2024-06-04T08:17:47Z) - The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models [67.38144169029617]
We map the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 Large Language Models (LLMs)
With PRISM, we contribute (i) wider geographic and demographic participation in feedback; (ii) census-representative samples for two countries (UK, US); and (iii) individualised ratings that link to detailed participant profiles, permitting personalisation and attribution of sample artefacts.
We use PRISM in three case studies to demonstrate the need for careful consideration of which humans provide what alignment data.
arXiv Detail & Related papers (2024-04-24T17:51:36Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - Challenges in Annotating Datasets to Quantify Bias in Under-represented
Society [7.9342597513806865]
Benchmark bias datasets have been developed for binary gender classification and ethical/racial considerations.
Motivated by the lack of annotated datasets for quantifying bias in under-represented societies, we created benchmark datasets for the New Zealand (NZ) population.
This research outlines the manual annotation process, provides an overview of the challenges we encountered and lessons learnt, and presents recommendations for future research.
arXiv Detail & Related papers (2023-09-11T22:24:39Z) - Artificial intelligence adoption in the physical sciences, natural
sciences, life sciences, social sciences and the arts and humanities: A
bibliometric analysis of research publications from 1960-2021 [73.06361680847708]
In 1960 14% of 333 research fields were related to AI, but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times.
In 1960 14% of 333 research fields were related to AI (many in computer science), but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times.
We conclude that the context of the current surge appears different, and that interdisciplinary AI application is likely to be sustained.
arXiv Detail & Related papers (2023-06-15T14:08:07Z) - How WEIRD is Usable Privacy and Security Research? (Extended Version) [7.669758543344074]
We conducted a literature review to understand the extent to which participant samples in UPS papers were from WEIRD countries.
Geographic and linguistic barriers in the study methods and recruitment methods may cause researchers to conduct user studies locally.
arXiv Detail & Related papers (2023-05-08T19:21:18Z) - Biomedical image analysis competitions: The state of current
participation practice [143.52578599912326]
We designed a survey to shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis.
The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics.
Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures.
arXiv Detail & Related papers (2022-12-16T16:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.