Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts
- URL: http://arxiv.org/abs/2501.04316v1
- Date: Wed, 08 Jan 2025 07:28:10 GMT
- Title: Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts
- Authors: Preethi Seshadri, Seraphina Goldfarb-Tarrant,
- Abstract summary: Race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%.
Retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem from general brittleness issues.
- Score: 5.111540255111445
- License:
- Abstract: Large language models (LLMs) are increasingly being deployed in high-stakes applications like hiring, yet their potential for unfair decision-making and outcomes remains understudied, particularly in generative settings. In this work, we examine the fairness of LLM-based hiring systems through two real-world tasks: resume summarization and retrieval. By constructing a synthetic resume dataset and curating job postings, we investigate whether model behavior differs across demographic groups and is sensitive to demographic perturbations. Our findings reveal that race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%. In the retrieval setting, all evaluated models display non-uniform selection patterns across demographic groups and exhibit high sensitivity to both gender and race-based perturbations. Surprisingly, retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem, in part, from general brittleness issues. Overall, our results indicate that LLM-based hiring systems, especially at the retrieval stage, can exhibit notable biases that lead to discriminatory outcomes in real-world contexts.
Related papers
- Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models [10.565316815513235]
Large language models (LLMs) may still exhibit implicit biases when simulating human behavior.
We show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations.
When comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified.
arXiv Detail & Related papers (2025-01-29T05:21:31Z) - How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.
Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models [12.12628747941818]
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring.
We introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks.
We analyze gender hiring biases in ten state-of-the-art LLMs.
arXiv Detail & Related papers (2024-06-17T09:15:57Z) - Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts.
We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness.
We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome
Homogenization? [90.35044668396591]
A recurring theme in machine learning is algorithmic monoculture: the same systems, or systems that share components, are deployed by multiple decision-makers.
We propose the component-sharing hypothesis: if decision-makers share components like training data or specific models, then they will produce more homogeneous outcomes.
We test this hypothesis on algorithmic fairness benchmarks, demonstrating that sharing training data reliably exacerbates homogenization.
We conclude with philosophical analyses of and societal challenges for outcome homogenization, with an eye towards implications for deployed machine learning systems.
arXiv Detail & Related papers (2022-11-25T09:33:11Z) - Deep Learning on a Healthy Data Diet: Finding Important Examples for
Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes.
Data augmentation reduces gender bias by adding counterfactual examples to the training dataset.
We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z) - Data Representativeness in Accessibility Datasets: A Meta-Analysis [7.6597163467929805]
We review datasets sourced by people with disabilities and older adults.
We find that accessibility datasets represent diverse ages, but have gender and race representation gaps.
We hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
arXiv Detail & Related papers (2022-07-16T23:32:19Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.