AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?
- URL: http://arxiv.org/abs/2312.10833v4
- Date: Mon, 27 Jan 2025 19:48:09 GMT
- Title: AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?
- Authors: Ehsan Latif, Xiaoming Zhai, Lei Liu,
- Abstract summary: This study delves into the pervasive issue of gender issues in artificial intelligence (AI)
It analyzes more than 1000 human-graded student responses from male and female participants across six assessment items.
Results indicate that scoring accuracy for mixed-trained models shows an insignificant difference from either male- or female-trained models.
- Score: 3.509963616428399
- License:
- Abstract: This study delves into the pervasive issue of gender issues in artificial intelligence (AI), specifically within automatic scoring systems for student-written responses. The primary objective is to investigate the presence of gender biases, disparities, and fairness in generally targeted training samples with mixed-gender datasets in AI scoring outcomes. Utilizing a fine-tuned version of BERT and GPT-3.5, this research analyzes more than 1000 human-graded student responses from male and female participants across six assessment items. The study employs three distinct techniques for bias analysis: Scoring accuracy difference to evaluate bias, mean score gaps by gender (MSG) to evaluate disparity, and Equalized Odds (EO) to evaluate fairness. The results indicate that scoring accuracy for mixed-trained models shows an insignificant difference from either male- or female-trained models, suggesting no significant scoring bias. Consistently with both BERT and GPT-3.5, we found that mixed-trained models generated fewer MSG and non-disparate predictions compared to humans. In contrast, compared to humans, gender-specifically trained models yielded larger MSG, indicating that unbalanced training data may create algorithmic models to enlarge gender disparities. The EO analysis suggests that mixed-trained models generated more fairness outcomes compared with gender-specifically trained models. Collectively, the findings suggest that gender-unbalanced data do not necessarily generate scoring bias but can enlarge gender disparities and reduce scoring fairness.
Related papers
- How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.
Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Identifying and examining machine learning biases on Adult dataset [0.7856362837294112]
This research delves into the reduction of machine learning model bias through Ensemble Learning.
Our rigorous methodology comprehensively assesses bias across various categorical variables, ultimately revealing a pronounced gender attribute bias.
This study underscores ethical considerations and advocates the implementation of hybrid models for a data-driven society marked by inclusivity and impartiality.
arXiv Detail & Related papers (2023-10-13T19:41:47Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to
Data Imbalance in Deep Learning Based Segmentation [1.6386696247541932]
"Fairness" in AI refers to assessing algorithms for potential bias based on demographic characteristics such as race and gender.
Deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, but no work has yet investigated the fairness of such models.
We find statistically significant differences in Dice performance between different racial groups.
arXiv Detail & Related papers (2021-06-23T13:27:35Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Do Neural Ranking Models Intensify Gender Bias? [13.37092521347171]
We first provide a bias measurement framework which includes two metrics to quantify the degree of the unbalanced presence of gender-related concepts in a given IR model's ranking list.
Applying these queries to the MS MARCO Passage retrieval collection, we then measure the gender bias of a BM25 model and several recent neural ranking models.
Results show that while all models are strongly biased toward male, the neural models, and in particular the ones based on contextualized embedding models, significantly intensify gender bias.
arXiv Detail & Related papers (2020-05-01T13:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.