Related papers: Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

URL: http://arxiv.org/abs/2510.04641v1
Date: Mon, 06 Oct 2025 09:45:32 GMT
Title: Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study
Authors: Ayan Majumdar, Feihao Chen, Jinghui Li, Xiaozhen Wang,
Abstract summary: Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases.<n>We present a comprehensive evaluation framework aimed at English texts to assess the ability of LLMs in detecting demographic-targeted social biases.<n>We then conduct a systematic evaluation with models across scales and techniques, including prompting, in-context learning, and fine-tuning.
Score: 1.6682715542079583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases, creating a regulatory need for data auditing and developing scalable bias-detection methods. Although prior work has investigated biases in text datasets and related detection methods, these studies remain narrow in scope. They typically focus on a single content type (e.g., hate speech), cover limited demographic axes, overlook biases affecting multiple demographics simultaneously, and analyze limited techniques. Consequently, practitioners lack a holistic understanding of the strengths and limitations of recent large language models (LLMs) for automated bias detection. In this study, we present a comprehensive evaluation framework aimed at English texts to assess the ability of LLMs in detecting demographic-targeted social biases. To align with regulatory requirements, we frame bias detection as a multi-label task using a demographic-focused taxonomy. We then conduct a systematic evaluation with models across scales and techniques, including prompting, in-context learning, and fine-tuning. Using twelve datasets spanning diverse content types and demographics, our study demonstrates the promise of fine-tuned smaller models for scalable detection. However, our analyses also expose persistent gaps across demographic axes and multi-demographic targeted biases, underscoring the need for more effective and scalable auditing frameworks.

Related papers

Interpretable Debiasing of Vision-Language Models for Social Fairness [55.85977929985967]
We introduce an interpretable, model-agnostic bias mitigation framework, DeBiasLens, that localizes social attribute neurons in Vision-Language models.<n>We train SAEs on facial image or caption datasets without corresponding social attribute labels to uncover neurons highly responsive to specific demographics.<n>Our research lays the groundwork for future auditing tools, prioritizing social fairness in emerging real-world AI systems.
arXiv Detail & Related papers (2026-02-27T13:37:11Z)
A Comprehensive Study of Implicit and Explicit Biases in Large Language Models [1.0555164678638427]
This study highlights the need to address biases in Large Language Models amid growing generative AI.<n>We studied bias-specific benchmarks such as StereoSet and CrowSPairs to evaluate the existence of various biases in multiple generative models such as BERT and GPT 3.5.<n>Results indicated fine-tuned models struggle with gender biases but excelled at identifying and avoiding racial biases.
arXiv Detail & Related papers (2025-11-18T05:27:17Z)
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases [0.0]
This study presents a detection framework to identify nuanced biases in Large Language Models (LLMs)<n>The approach integrates contextual analysis, interpretability via attention mechanisms, and counterfactual data augmentation to capture hidden biases.<n>Results show improvements in detecting subtle biases compared to conventional methods.
arXiv Detail & Related papers (2025-03-08T04:43:01Z)
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models. We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z)
ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes. We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Metrics for Dataset Demographic Bias: A Case Study on Facial Expression Recognition [4.336779198334903]
One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
arXiv Detail & Related papers (2023-03-28T11:04:18Z)
Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness. We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.