Are Emily and Greg Still More Employable than Lakisha and Jamal?
Investigating Algorithmic Hiring Bias in the Era of ChatGPT
- URL: http://arxiv.org/abs/2310.05135v1
- Date: Sun, 8 Oct 2023 12:08:48 GMT
- Title: Are Emily and Greg Still More Employable than Lakisha and Jamal?
Investigating Algorithmic Hiring Bias in the Era of ChatGPT
- Authors: Akshaj Kumar Veldanda, Fabian Grob, Shailja Thakur, Hammond Pearce,
Benjamin Tan, Ramesh Karri, Siddharth Garg
- Abstract summary: Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks.
We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information.
Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation.
- Score: 24.496590819263865
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit
applicability across numerous tasks. One domain of interest is their use in
algorithmic hiring, specifically in matching resumes with job categories. Yet,
this introduces issues of bias on protected attributes like gender, race and
maternity status. The seminal work of Bertrand & Mullainathan (2003) set the
gold-standard for identifying hiring bias via field experiments where the
response rate for identical resumes that differ only in protected attributes,
e.g., racially suggestive names such as Emily or Lakisha, is compared. We
replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and
Llama) to evaluate bias (or lack thereof) on gender, race, maternity status,
pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1)
matching resumes to job categories; and (2) summarizing resumes with employment
relevant information. Overall, LLMs are robust across race and gender. They
differ in their performance on pregnancy status and political affiliation. We
use contrastive input decoding on open-source LLMs to uncover potential sources
of bias.
Related papers
- Gender Bias in LLM-generated Interview Responses [1.6124402884077915]
This study evaluates three LLMs to conduct a multifaceted audit of LLM-generated interview responses across models, question types, and jobs.
Our findings reveal that gender bias is consistent, and closely aligned with gender stereotypes and the dominance of jobs.
arXiv Detail & Related papers (2024-10-28T05:08:08Z) - Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval [5.122502168590131]
We investigate the possibilities of using large language models (LLMs) in a resume screening setting via a document retrieval framework.
We then perform a resume audit study to determine whether a selection of Massive Text Embedding (MTE) models are biased in resume screening scenarios.
We find that the MTEs are biased, significantly favoring White-associated names in 85.1% of cases and female-associated names in only 11.1% of cases.
arXiv Detail & Related papers (2024-07-29T18:42:39Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - "You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations [29.183942575629214]
We utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with 320 first names that strongly signal their race and gender.
Our empirical results indicate a preference among these models for hiring candidates with White female-sounding names over other demographic groups across 40 occupations.
arXiv Detail & Related papers (2024-06-18T03:11:43Z) - JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models [12.12628747941818]
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring.
We introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks.
We analyze gender hiring biases in ten state-of-the-art LLMs.
arXiv Detail & Related papers (2024-06-17T09:15:57Z) - The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring [0.9499648210774584]
We conduct an algorithm audit of race and gender biases in one commonly-used large language model.
We find that the model reflects some biases based on stereotypes.
Women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers.
arXiv Detail & Related papers (2024-05-07T15:39:45Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.