The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring
- URL: http://arxiv.org/abs/2405.04412v3
- Date: Fri, 15 Nov 2024 16:53:18 GMT
- Title: The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring
- Authors: Lena Armstrong, Abbey Liu, Stephen MacNeil, Danaƫ Metaxa,
- Abstract summary: We conduct an AI audit of race and gender biases in one commonly-used large language model.
We find that the model reflects some biases based on stereotypes.
Women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers.
- Score: 0.9499648210774584
- License:
- Abstract: Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an AI audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, particularly in workplace contexts.
Related papers
- Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval [5.122502168590131]
We investigate the possibilities of using large language models (LLMs) in a resume screening setting via a document retrieval framework.
We then perform a resume audit study to determine whether a selection of Massive Text Embedding (MTE) models are biased in resume screening scenarios.
We find that the MTEs are biased, significantly favoring White-associated names in 85.1% of cases and female-associated names in only 11.1% of cases.
arXiv Detail & Related papers (2024-07-29T18:42:39Z) - Evaluation of Bias Towards Medical Professionals in Large Language Models [11.450991679521605]
GPT-4, Claude-3, and Mistral-Large showed significant gender and racial biases when evaluating medical professionals for residency selection.
Tests revealed strong preferences towards Hispanic females and Asian males in various specialties.
arXiv Detail & Related papers (2024-06-30T05:55:55Z) - "You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations [29.183942575629214]
We utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with 320 first names that strongly signal their race and gender.
Our empirical results indicate a preference among these models for hiring candidates with White female-sounding names over other demographic groups across 40 occupations.
arXiv Detail & Related papers (2024-06-18T03:11:43Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z) - Are Emily and Greg Still More Employable than Lakisha and Jamal?
Investigating Algorithmic Hiring Bias in the Era of ChatGPT [24.496590819263865]
Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks.
We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information.
Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation.
arXiv Detail & Related papers (2023-10-08T12:08:48Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.