Related papers: A Comprehensive Study of Implicit and Explicit Biases in Large Language Models

A Comprehensive Study of Implicit and Explicit Biases in Large Language Models

URL: http://arxiv.org/abs/2511.14153v1
Date: Tue, 18 Nov 2025 05:27:17 GMT
Title: A Comprehensive Study of Implicit and Explicit Biases in Large Language Models
Authors: Fatima Kazi, Alex Young, Yash Inani, Setareh Rafatirad,
Abstract summary: This study highlights the need to address biases in Large Language Models amid growing generative AI.<n>We studied bias-specific benchmarks such as StereoSet and CrowSPairs to evaluate the existence of various biases in multiple generative models such as BERT and GPT 3.5.<n>Results indicated fine-tuned models struggle with gender biases but excelled at identifying and avoiding racial biases.
Score: 1.0555164678638427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) inherit explicit and implicit biases from their training datasets. Identifying and mitigating biases in LLMs is crucial to ensure fair outputs, as they can perpetuate harmful stereotypes and misinformation. This study highlights the need to address biases in LLMs amid growing generative AI. We studied bias-specific benchmarks such as StereoSet and CrowSPairs to evaluate the existence of various biases in multiple generative models such as BERT and GPT 3.5. We proposed an automated Bias-Identification Framework to recognize various social biases in LLMs such as gender, race, profession, and religion. We adopted a two-pronged approach to detect explicit and implicit biases in text data. Results indicated fine-tuned models struggle with gender biases but excelled at identifying and avoiding racial biases. Our findings illustrated that despite having some success, LLMs often over-relied on keywords. To illuminate the capability of the analyzed LLMs in detecting implicit biases, we employed Bag-of-Words analysis and unveiled indications of implicit stereotyping within the vocabulary. To bolster the model performance, we applied an enhancement strategy involving fine-tuning models using prompting techniques and data augmentation of the bias benchmarks. The fine-tuned models exhibited promising adaptability during cross-dataset testing and significantly enhanced performance on implicit bias benchmarks, with performance gains of up to 20%.

Related papers

Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation [0.0]
Large Language models (LLMs) have gained popularity in recent years with the advancement of Natural Language Processing (NLP)<n>This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI)<n>We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many different generative models such as BERT, GPT 3.5, and ADA.
arXiv Detail & Related papers (2025-11-18T05:43:34Z)
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models [0.9620910657090186]
Large Language Models (LLMs) have increased the performance of different natural language understanding as well as generation tasks.<n>We provide a unified evaluation of benchmarks using a set of representative small and medium-sized LLMs.<n>We propose five prompting approaches to carry out the bias detection task across different aspects of bias.<n>The results indicate that each of the selected LLMs suffer from one or the other form of bias with the Phi-3.5B model being the least biased.
arXiv Detail & Related papers (2025-03-15T03:58:14Z)
Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models [10.565316815513235]
Large language models (LLMs) may still exhibit implicit biases when simulating human behavior.<n>We show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations.<n>When comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified.
arXiv Detail & Related papers (2025-01-29T05:21:31Z)
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection [18.625071242029936]
Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.<n>This paper presents a systematic framework to investigate and compare explicit and implicit biases in LLMs.
arXiv Detail & Related papers (2025-01-04T14:08:52Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
The African Woman is Rhythmic and Soulful: An Investigation of Implicit Biases in LLM Open-ended Text Generation [3.9945212716333063]
Implicit biases are significant because they influence the decisions made by Large Language Models (LLMs) Traditionally, explicit bias tests or embedding-based methods are employed to detect bias, but these approaches can overlook more nuanced, implicit forms of bias. We introduce two novel psychological-inspired methodologies to reveal and measure implicit biases through prompt-based and decision-making tasks.
arXiv Detail & Related papers (2024-07-01T13:21:33Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.