Related papers: Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes

Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes

URL: http://arxiv.org/abs/2603.05189v1
Date: Thu, 05 Mar 2026 13:58:07 GMT
Title: Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes
Authors: Bryan Chen Zhengyu Tan, Shaun Khoo, Bich Ngoc Doan, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee,
Abstract summary: We introduce a generalisable stress-test framework for hiring fairness in Singapore.<n>100 job-aligned resumes are augmented into 4100 variants spanning four ethnicities and two genders.<n>Our findings suggest that seemingly innocuous markers surviving anonymisation can materially skew automated hiring outcomes.
Score: 44.408853022070105
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in resume screening pipelines. Although explicit PII (e.g., names) is commonly redacted, resumes typically retain subtle sociocultural markers (languages, co-curricular activities, volunteering, hobbies) that can act as demographic proxies. We introduce a generalisable stress-test framework for hiring fairness, instantiated in the Singapore context: 100 neutral job-aligned resumes are augmented into 4100 variants spanning four ethnicities and two genders, differing only in job-irrelevant markers. We evaluate 18 LLMs in two realistic settings: (i) Direct Comparison (1v1) and (ii) Score & Shortlist (top-scoring rate), each with and without rationale prompting. Even without explicit identifiers, models recover demographic attributes with high F1 and exhibit systematic disparities, with models favouring markers associated with Chinese and Caucasian males. Ablations show language markers suffice for ethnicity inference, whereas gender relies on hobbies and activities. Furthermore, prompting for explanations tends to amplify bias. Our findings suggest that seemingly innocuous markers surviving anonymisation can materially skew automated hiring outcomes.

Related papers

Mitigating Bias with Words: Inducing Demographic Ambiguity in Face Recognition Templates by Text Encoding [18.08946802592489]
Face recognition systems are often prone to demographic biases, partially due to the entanglement of demographic-specific information with identity-relevant features in facial embeddings.<n>We propose a novel strategy, Unified Text-Image Embedding (UTIE), which aims to induce demographic ambiguity in face embeddings.<n>We show that UTIE consistently reduces bias metrics while maintaining, or even improving in several cases, the face verification accuracy.
arXiv Detail & Related papers (2025-12-05T10:41:58Z)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models [81.45743826739054]
A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M.<n>We create person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions.<n>Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content.
arXiv Detail & Related papers (2025-10-04T07:51:59Z)
Robustly Improving LLM Fairness in Realistic Settings via Interpretability [0.16843915833103415]
Anti-bias prompts fail when realistic contextual details are introduced.<n>We find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints induces significant racial and gender biases.<n>Our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time.
arXiv Detail & Related papers (2025-06-12T17:34:38Z)
Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts [19.20592062296075]
Large language models (LLMs) are increasingly being deployed in high-stakes applications like hiring.<n>This work examines the allocational fairness of LLM-based hiring systems through two tasks that reflect actual HR usage.
arXiv Detail & Related papers (2025-01-08T07:28:10Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models [12.12628747941818]
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring. We introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks. We analyze gender hiring biases in ten state-of-the-art LLMs.
arXiv Detail & Related papers (2024-06-17T09:15:57Z)
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts.<n>We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness.<n>We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.