Evaluating LLMs for Gender Disparities in Notable Persons
- URL: http://arxiv.org/abs/2403.09148v1
- Date: Thu, 14 Mar 2024 07:58:27 GMT
- Title: Evaluating LLMs for Gender Disparities in Notable Persons
- Authors: Lauren Rhue, Sofie Goethals, Arun Sundararajan,
- Abstract summary: This study examines the use of Large Language Models (LLMs) for retrieving factual information.
It addresses concerns over their propensity to produce factually incorrect "hallucinated" responses or to altogether decline to answer prompt at all.
- Score: 0.40964539027092906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study examines the use of Large Language Models (LLMs) for retrieving factual information, addressing concerns over their propensity to produce factually incorrect "hallucinated" responses or to altogether decline to even answer prompt at all. Specifically, it investigates the presence of gender-based biases in LLMs' responses to factual inquiries. This paper takes a multi-pronged approach to evaluating GPT models by evaluating fairness across multiple dimensions of recall, hallucinations and declinations. Our findings reveal discernible gender disparities in the responses generated by GPT-3.5. While advancements in GPT-4 have led to improvements in performance, they have not fully eradicated these gender disparities, notably in instances where responses are declined. The study further explores the origins of these disparities by examining the influence of gender associations in prompts and the homogeneity in the responses.
Related papers
- Who Gets Cited? Gender- and Majority-Bias in LLM-Driven Reference Selection [0.16317061277456998]
This study systematically investigates gender bias in large language models (LLMs)<n>Our results reveal two forms of bias: a persistent preference for male-authored references and a majority-group bias that favors whichever gender is more prevalent in the candidate pool.<n>Our findings indicate that LLMs can reinforce or exacerbate existing gender imbalances in scholarly recognition.
arXiv Detail & Related papers (2025-08-02T13:27:32Z) - Do LLMs have a Gender (Entropy) Bias? [3.2225437367979763]
We define and study entropy bias, which we define as a discrepancy in the amount of information generated by an LLM in response to real questions users have asked.<n>Our analyses suggest that there is no significant bias in LLM responses for men and women at a category level.<n>We suggest a simple debiasing approach that iteratively merges the responses for the two genders to produce a final result.
arXiv Detail & Related papers (2025-05-24T23:06:41Z) - A database to support the evaluation of gender biases in GPT-4o output [4.517392236571035]
A prominent ethical risk of Large Language Models (LLMs) is the generation of unfair language output.
We propose a novel approach to database construction to assess gender-related biases.
arXiv Detail & Related papers (2025-02-28T09:54:13Z) - Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing.
We identify several features of LLM responses that shape users' reliance.
We find that explanations increase reliance on both correct and incorrect responses.
We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z) - Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts [5.111540255111445]
Race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%.
Retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem from general brittleness issues.
arXiv Detail & Related papers (2025-01-08T07:28:10Z) - Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data [13.91630413828167]
This study focuses on identifying the performance disparities of Whisper models on Dutch speech data.
We analyzed the word error rate, character error rate and a BERT-based semantic similarity across gender groups.
arXiv Detail & Related papers (2024-11-14T13:29:09Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - Gender Bias in LLM-generated Interview Responses [1.6124402884077915]
This study evaluates three LLMs to conduct a multifaceted audit of LLM-generated interview responses across models, question types, and jobs.
Our findings reveal that gender bias is consistent, and closely aligned with gender stereotypes and the dominance of jobs.
arXiv Detail & Related papers (2024-10-28T05:08:08Z) - Exploring Social Desirability Response Bias in Large Language Models: Evidence from GPT-4 Simulations [4.172974580485295]
Large language models (LLMs) are employed to simulate human-like responses in social surveys.
It remains unclear if they develop biases like social desirability response (SDR) bias.
The study underscores potential avenues for using LLMs to investigate biases in both humans and LLMs themselves.
arXiv Detail & Related papers (2024-10-20T16:28:24Z) - ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society [7.281887764378982]
We used ChatGPT-3.5 to simulate the sampling process and generated six socioeconomic characteristics from the 2020 US population.
We analyzed responses to questions about income inequality and gender roles to explore GPT's subjective attitudes.
Our findings show some alignment in gender and age means with the actual 2020 US population, but we also found mismatches in the distributions of racial and educational groups.
arXiv Detail & Related papers (2024-09-04T10:33:37Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z) - Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion
Related to Harms of Misinformation [8.066880413153187]
This paper examines whether a large language model (LLM) can reflect the views of various groups when assessing the harms of misinformation.
We present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics.
We find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences.
arXiv Detail & Related papers (2024-01-29T20:50:28Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.