Related papers: Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

URL: http://arxiv.org/abs/2409.14583v2
Date: Fri, 18 Oct 2024 05:41:03 GMT
Title: Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios
Authors: Vishal Mirza, Rahul Kulkarni, Aakanksha Jadhav,
Abstract summary: This paper examines bias in Large Language Models (LLMs) Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations. Efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advancements in Large Language Models(LLMs) have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs-a crucial issue affecting their usability, reliability, and fairness. Researchers are developing strategies to mitigate bias, including debiasing layers, specialized reference datasets like Winogender and Winobias, and reinforcement learning with human feedback (RLHF). These techniques have been integrated into the latest LLMs. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. We observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating the issue. These results highlight the limitations of current bias mitigation techniques and underscore the need for more effective approaches.

Related papers

Who Gets Cited? Gender- and Majority-Bias in LLM-Driven Reference Selection [0.16317061277456998]
This study systematically investigates gender bias in large language models (LLMs)<n>Our results reveal two forms of bias: a persistent preference for male-authored references and a majority-group bias that favors whichever gender is more prevalent in the candidate pool.<n>Our findings indicate that LLMs can reinforce or exacerbate existing gender imbalances in scholarly recognition.
arXiv Detail & Related papers (2025-08-02T13:27:32Z)
Robustly Improving LLM Fairness in Realistic Settings via Interpretability [0.16843915833103415]
Anti-bias prompts fail when realistic contextual details are introduced.<n>We find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints induces significant racial and gender biases.<n>Our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time.
arXiv Detail & Related papers (2025-06-12T17:34:38Z)
LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models [13.40656836132881]
We propose datasets named GenBiasEval and GenHintEval.<n>The GenBiasEval is responsible for evaluating the degree of gender bias in LLMs.<n>The GenHintEval is used to assess whether LLMs can provide responses consistent with prompts that contain gender hints.
arXiv Detail & Related papers (2025-05-21T12:49:37Z)
The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data [8.26034886618475]
We investigate whether Large Language Models can predict an individual's gender based solely on online shopping histories. Using a dataset of historical online purchases from users in the United States, we evaluate the ability of six LLMs to classify gender. Results indicate that while models can infer gender with moderate accuracy, their decisions are often rooted in stereotypical associations between product categories and gender.
arXiv Detail & Related papers (2025-04-02T17:56:08Z)
Assessing Gender Bias in LLMs: Comparing LLM Outputs with Human Perceptions and Official Statistics [0.0]
This study investigates gender bias in large language models (LLMs) We compare their gender perception to that of human respondents, U.S. Bureau of Labor Statistics data, and a 50% no-bias benchmark.
arXiv Detail & Related papers (2024-11-20T22:43:18Z)
Popular LLMs Amplify Race and Gender Disparities in Human Mobility [2.601262068492271]
This study investigates whether large language models (LLMs) exhibit biases in predicting human mobility based on race and gender. We find that LLMs frequently reflect and amplify existing societal biases.
arXiv Detail & Related papers (2024-11-18T19:41:20Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases. GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models [20.98831667981121]
Large Language Models (LLMs) are prone to generating content that exhibits gender biases. GenderAlign dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response. Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality.
arXiv Detail & Related papers (2024-06-20T01:45:44Z)
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models [12.12628747941818]
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring. We introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks. We analyze gender hiring biases in ten state-of-the-art LLMs.
arXiv Detail & Related papers (2024-06-17T09:15:57Z)
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses. We propose an indirect probing framework based on conditional generation. We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z)
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z)
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses. We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z)
Gender bias and stereotypes in Large Language Models [0.6882042556551611]
This paper investigates Large Language Models' behavior with respect to gender stereotypes. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize
arXiv Detail & Related papers (2023-08-28T22:32:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.