Identifying and Improving Disability Bias in GPT-Based Resume Screening
- URL: http://arxiv.org/abs/2402.01732v2
- Date: Wed, 22 May 2024 19:15:18 GMT
- Title: Identifying and Improving Disability Bias in GPT-Based Resume Screening
- Authors: Kate Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, Jennifer Mankoff,
- Abstract summary: We ask ChatGPT to rank a resume against the same resume enhanced with an additional leadership award, scholarship, panel presentation, and membership that are disability related.
We find that GPT-4 exhibits prejudice towards these enhanced CVs.
We show that this prejudice can be quantifiably reduced by training a custom GPTs on principles of DEI and disability justice.
- Score: 9.881826151448198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Generative AI rises in adoption, its use has expanded to include domains such as hiring and recruiting. However, without examining the potential of bias, this may negatively impact marginalized populations, including people with disabilities. To address this important concern, we present a resume audit study, in which we ask ChatGPT (specifically, GPT-4) to rank a resume against the same resume enhanced with an additional leadership award, scholarship, panel presentation, and membership that are disability related. We find that GPT-4 exhibits prejudice towards these enhanced CVs. Further, we show that this prejudice can be quantifiably reduced by training a custom GPTs on principles of DEI and disability justice. Our study also includes a unique qualitative analysis of the types of direct and indirect ableism GPT-4 uses to justify its biased decisions and suggest directions for additional bias mitigation work. Additionally, since these justifications are presumably drawn from training data containing real-world biased statements made by humans, our analysis suggests additional avenues for understanding and addressing human bias.
Related papers
- An Empirical Analysis on Large Language Models in Debate Evaluation [10.677407097411768]
We investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation.
We uncover a consistent bias in both GPT-3.5 and GPT-4 towards the second candidate response presented.
We also uncover lexical biases in both GPT-3.5 and GPT-4, especially when label sets carry connotations such as numerical or sequential.
arXiv Detail & Related papers (2024-05-28T18:34:53Z) - Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4 [0.0]
This study examined the problem-solving behavior of humans and OpenAl's GPT-4 large language model.
The experiments involved 588 participants from the U.S. and 680 iterations of the GPT-4 model.
arXiv Detail & Related papers (2024-04-25T15:53:00Z) - Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction.
This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks.
Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z) - Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically.
In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs.
Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z) - Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise
Given to Students in Synthetic Dialogues [2.3361634876233817]
Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings.
The accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback.
arXiv Detail & Related papers (2023-07-05T04:14:01Z) - Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly
Specialized Domain Expertise? [0.8924669503280334]
GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators.
We demonstrated how to analyze GPT-4's predictions to identify and mitigate deficiencies in annotation guidelines.
arXiv Detail & Related papers (2023-06-24T08:48:24Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z) - Humans in Humans Out: On GPT Converging Toward Common Sense in both
Success and Failure [0.0]
GPT-3, GPT-3.5, and GPT-4 were trained on large quantities of human-generated text.
We show that GPT-3 showed evidence of ETR-predicted outputs for 59% of these examples.
Remarkably, the production of human-like fallacious judgments increased from 18% in GPT-3 to 33% in GPT-3.5 and 34% in GPT-4.
arXiv Detail & Related papers (2023-03-30T10:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.