Who Gets the Callback? Generative AI and Gender Bias
- URL: http://arxiv.org/abs/2504.21400v1
- Date: Wed, 30 Apr 2025 07:55:52 GMT
- Title: Who Gets the Callback? Generative AI and Gender Bias
- Authors: Sugat Chaturvedi, Rochana Chaturvedi,
- Abstract summary: We find that large language models (LLMs) tend to favor men, especially for higher-wage roles.<n>A comprehensive analysis of linguistic features in job ads reveals strong alignment of model recommendations with traditional gender stereotypes.<n>Our findings highlight how AI-driven hiring may perpetuate biases in the labor market and have implications for fairness and diversity within firms.
- Score: 0.030693357740321777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative artificial intelligence (AI), particularly large language models (LLMs), is being rapidly deployed in recruitment and for candidate shortlisting. We audit several mid-sized open-source LLMs for gender bias using a dataset of 332,044 real-world online job postings. For each posting, we prompt the model to recommend whether an equally qualified male or female candidate should receive an interview callback. We find that most models tend to favor men, especially for higher-wage roles. Mapping job descriptions to the Standard Occupational Classification system, we find lower callback rates for women in male-dominated occupations and higher rates in female-associated ones, indicating occupational segregation. A comprehensive analysis of linguistic features in job ads reveals strong alignment of model recommendations with traditional gender stereotypes. To examine the role of recruiter identity, we steer model behavior by infusing Big Five personality traits and simulating the perspectives of historical figures. We find that less agreeable personas reduce stereotyping, consistent with an agreeableness bias in LLMs. Our findings highlight how AI-driven hiring may perpetuate biases in the labor market and have implications for fairness and diversity within firms.
Related papers
- The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data [8.26034886618475]
We investigate whether Large Language Models can predict an individual's gender based solely on online shopping histories.<n>Using a dataset of historical online purchases from users in the United States, we evaluate the ability of six LLMs to classify gender.<n>Results indicate that while models can infer gender with moderate accuracy, their decisions are often rooted in stereotypical associations between product categories and gender.
arXiv Detail & Related papers (2025-04-02T17:56:08Z) - Evaluating Gender Bias in Large Language Models [0.8636148452563583]
The study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts.
The jobs considered include a range of occupations, from those with a significant male presence to those with a notable female concentration.
The results show a positive correlation between the models' pronoun choices and the gender distribution present in U.S. labor force data.
arXiv Detail & Related papers (2024-11-14T22:23:13Z) - Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs)<n>Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances.<n>To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance.
arXiv Detail & Related papers (2024-10-25T05:59:44Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems.<n>This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Gendered Language in Resumes and its Implications for Algorithmic Bias
in Hiring [0.0]
We train a series of models to classify the gender of the applicant.
We investigate whether it is possible to obfuscate gender from resumes.
We find that there is a significant amount of gendered information in resumes even after obfuscation.
arXiv Detail & Related papers (2021-12-16T14:26:36Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.