Related papers: Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts

Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts

URL: http://arxiv.org/abs/2602.09442v1
Date: Tue, 10 Feb 2026 06:27:56 GMT
Title: Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts
Authors: Shweta Parihar, Lu Cheng,
Abstract summary: Social biases inherent in large language models (LLMs) raise significant fairness concerns.<n>This work focuses on evaluating and understanding the social bias implications of RAG.
Score: 7.344577590113121
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to enhance the generative capabilities of LLMs, remain susceptible to the same bias-related challenges. This work focuses on evaluating and understanding the social bias implications of RAG. Through extensive experiments across various retrieval corpora, LLMs, and bias evaluation datasets, encompassing more than 13 different bias types, we surprisingly observe a reduction in bias in RAG. This suggests that the inclusion of external context can help counteract stereotype-driven predictions, potentially improving fairness by diversifying the contextual grounding of the model's outputs. To better understand this phenomenon, we then explore the model's reasoning process by integrating Chain-of-Thought (CoT) prompting into RAG while assessing the faithfulness of the model's CoT. Our experiments reveal that the model's bias inclinations shift between stereotype and anti-stereotype responses as more contextual information is incorporated from the retrieved documents. Interestingly, we find that while CoT enhances accuracy, contrary to the bias reduction observed with RAG, it increases overall bias across datasets, highlighting the need for bias-aware reasoning frameworks that can mitigate this trade-off.

Related papers

Adaptive Generation of Bias-Eliciting Questions for LLMs [18.608477560948003]
Large language models (LLMs) are now widely deployed in user-facing applications, reaching hundreds of millions worldwide.<n>We introduce a counterfactual bias evaluation framework that automatically generates realistic, open-ended questions over sensitive attributes such as sex, race, or religion.<n>We also capture distinct response dimensions that are increasingly relevant in user interactions, such as asymmetric refusals and explicit acknowledgment of bias.
arXiv Detail & Related papers (2025-10-14T13:08:10Z)
IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context [10.90604216960609]
Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities.<n>We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity.<n>We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status.
arXiv Detail & Related papers (2025-10-03T06:03:26Z)
Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes [14.331322509462419]
Social-group biases intrinsic to foundational encoder-based vision-language models (VLMs) manifest in biases in downstream tasks.<n>We introduce a controlled framework to measure this propagation by correlating intrinsic measures of bias in the representational space with measures of bias in zero-shot text-to-image (TTI) and image-to-text (ITT) retrieval.<n>Results show substantial correlations between intrinsic and extrinsic bias, with an average $rho$ = 0.83 $pm$ 0.10.<n> Notably, we find that larger/better-performing models exhibit greater bias propagation, a finding that raises concerns
arXiv Detail & Related papers (2025-06-06T20:01:32Z)
BiasConnect: Investigating Bias Interactions in Text-to-Image Models [73.76853483463836]
We introduce BiasConnect, a novel tool designed to analyze and quantify bias interactions in Text-to-Image models.<n>Our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified.<n>We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.
arXiv Detail & Related papers (2025-03-12T19:01:41Z)
Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases [0.0]
This study presents a detection framework to identify nuanced biases in Large Language Models (LLMs)<n>The approach integrates contextual analysis, interpretability via attention mechanisms, and counterfactual data augmentation to capture hidden biases.<n>Results show improvements in detecting subtle biases compared to conventional methods.
arXiv Detail & Related papers (2025-03-08T04:43:01Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.<n>We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.<n>We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.