Related papers: Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks

Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks

URL: http://arxiv.org/abs/2507.16989v1
Date: Tue, 22 Jul 2025 19:54:49 GMT
Title: Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks
Authors: Giulio Pelosio, Devesh Batra, Noémie Bovey, Robert Hankache, Cristovao Iglesias, Greig Cowan, Raad Khraishi,
Abstract summary: Large Language Models (LLMs) can exhibit latent biases towards specific nationalities even when explicit demographic markers are not present.<n>We introduce a novel name-based benchmarking approach to investigate the impact of substituting explicit nationality labels with culturally indicative names.<n>Our experiments show that small models are less accurate and exhibit more bias compared to their larger counterparts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) can exhibit latent biases towards specific nationalities even when explicit demographic markers are not present. In this work, we introduce a novel name-based benchmarking approach derived from the Bias Benchmark for QA (BBQ) dataset to investigate the impact of substituting explicit nationality labels with culturally indicative names, a scenario more reflective of real-world LLM applications. Our novel approach examines how this substitution affects both bias magnitude and accuracy across a spectrum of LLMs from industry leaders such as OpenAI, Google, and Anthropic. Our experiments show that small models are less accurate and exhibit more bias compared to their larger counterparts. For instance, on our name-based dataset and in the ambiguous context (where the correct choice is not revealed), Claude Haiku exhibited the worst stereotypical bias scores of 9%, compared to only 3.5% for its larger counterpart, Claude Sonnet, where the latter also outperformed it by 117.7% in accuracy. Additionally, we find that small models retain a larger portion of existing errors in these ambiguous contexts. For example, after substituting names for explicit nationality references, GPT-4o retains 68% of the error rate versus 76% for GPT-4o-mini, with similar findings for other model providers, in the ambiguous context. Our research highlights the stubborn resilience of biases in LLMs, underscoring their profound implications for the development and deployment of AI systems in diverse, global contexts.

Related papers

McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models [26.202296897643382]
We present a Multi-task Chinese Bias Evaluation Benchmark (McBE) that includes 4,077 bias evaluation instances.<n>This dataset provides extensive category coverage, content diversity, and measuring comprehensiveness.<n>We conduct an in-depth analysis of results, offering novel insights into bias in large language models (LLMs)
arXiv Detail & Related papers (2025-07-02T19:04:56Z)
Robustly Improving LLM Fairness in Realistic Settings via Interpretability [0.16843915833103415]
Anti-bias prompts fail when realistic contextual details are introduced.<n>We find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints induces significant racial and gender biases.<n>Our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time.
arXiv Detail & Related papers (2025-06-12T17:34:38Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z)
LIBRA: Measuring Bias of Large Language Model from a Local Context [9.612845616659776]
Large Language Models (LLMs) have significantly advanced natural language processing applications.<n>Yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups.<n>This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias.
arXiv Detail & Related papers (2025-02-02T04:24:57Z)
Revealing Hidden Bias in AI: Lessons from Large Language Models [0.0]
This study examines biases in candidate interview reports generated by Claude 3.5 Sonnet, GPT-4o, Gemini 1.5, and Llama 3.1 405B. We evaluate the effectiveness of LLM-based anonymization in reducing these biases.
arXiv Detail & Related papers (2024-10-22T11:58:54Z)
Bias Similarity Across Large Language Models [32.0365189539138]
Bias in Large Language Models remains a critical concern as these systems are increasingly deployed in high-stakes applications.<n>We evaluate bias similarity as a form of functional similarity and evaluate 24 LLMs on over one million structured prompts spanning four bias dimensions.<n>Our findings uncover that fairness is not strongly determined by model size, architecture, instruction tuning, or openness.
arXiv Detail & Related papers (2024-10-15T19:21:14Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs)<n>VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status.<n>We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
What's in a Name? Auditing Large Language Models for Race and Gender Bias [45.1187517058961]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.<n>We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. We propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.