AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?
- URL: http://arxiv.org/abs/2312.10833v2
- Date: Tue, 26 Dec 2023 00:30:38 GMT
- Title: AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?
- Authors: Ehsan Latif, Xiaoming Zhai, and Lei Liu
- Abstract summary: This study delves into the pervasive issue of gender issues in artificial intelligence (AI)
It analyzes more than 1000 human-graded student responses from male and female participants across six assessment items.
Results indicate that scoring accuracy for mixed-trained models shows an insignificant difference from either male- or female-trained models.
- Score: 3.509963616428399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study delves into the pervasive issue of gender issues in artificial
intelligence (AI), specifically within automatic scoring systems for
student-written responses. The primary objective is to investigate the presence
of gender biases, disparities, and fairness in generally targeted training
samples with mixed-gender datasets in AI scoring outcomes. Utilizing a
fine-tuned version of BERT and GPT-3.5, this research analyzes more than 1000
human-graded student responses from male and female participants across six
assessment items. The study employs three distinct techniques for bias
analysis: Scoring accuracy difference to evaluate bias, mean score gaps by
gender (MSG) to evaluate disparity, and Equalized Odds (EO) to evaluate
fairness. The results indicate that scoring accuracy for mixed-trained models
shows an insignificant difference from either male- or female-trained models,
suggesting no significant scoring bias. Consistently with both BERT and
GPT-3.5, we found that mixed-trained models generated fewer MSG and
non-disparate predictions compared to humans. In contrast, compared to humans,
gender-specifically trained models yielded larger MSG, indicating that
unbalanced training data may create algorithmic models to enlarge gender
disparities. The EO analysis suggests that mixed-trained models generated more
fairness outcomes compared with gender-specifically trained models.
Collectively, the findings suggest that gender-unbalanced data do not
necessarily generate scoring bias but can enlarge gender disparities and reduce
scoring fairness.
Related papers
- GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Identifying and examining machine learning biases on Adult dataset [0.7856362837294112]
This research delves into the reduction of machine learning model bias through Ensemble Learning.
Our rigorous methodology comprehensively assesses bias across various categorical variables, ultimately revealing a pronounced gender attribute bias.
This study underscores ethical considerations and advocates the implementation of hybrid models for a data-driven society marked by inclusivity and impartiality.
arXiv Detail & Related papers (2023-10-13T19:41:47Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to
Data Imbalance in Deep Learning Based Segmentation [1.6386696247541932]
"Fairness" in AI refers to assessing algorithms for potential bias based on demographic characteristics such as race and gender.
Deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, but no work has yet investigated the fairness of such models.
We find statistically significant differences in Dice performance between different racial groups.
arXiv Detail & Related papers (2021-06-23T13:27:35Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Do Neural Ranking Models Intensify Gender Bias? [13.37092521347171]
We first provide a bias measurement framework which includes two metrics to quantify the degree of the unbalanced presence of gender-related concepts in a given IR model's ranking list.
Applying these queries to the MS MARCO Passage retrieval collection, we then measure the gender bias of a BM25 model and several recent neural ranking models.
Results show that while all models are strongly biased toward male, the neural models, and in particular the ones based on contextualized embedding models, significantly intensify gender bias.
arXiv Detail & Related papers (2020-05-01T13:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.