Related papers: Us-vs-Them bias in Large Language Models

Us-vs-Them bias in Large Language Models

URL: http://arxiv.org/abs/2512.13699v1
Date: Wed, 03 Dec 2025 07:11:22 GMT
Title: Us-vs-Them bias in Large Language Models
Authors: Tabia Tanzin Prama, Julia Witte Zimmerman, Christopher M. Danforth, Peter Sheridan Dodds,
Abstract summary: We find consistent ingroup-positive and outgroup-negative associations across foundational large language models.<n>For personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity.
Score: 0.569978892646475
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study investigates ``us versus them'' bias, as described by Social Identity Theory, in large language models (LLMs) under both default and persona-conditioned settings across multiple architectures (GPT-4.1, DeepSeek-3.1, Gemma-2.0, Grok-3.0, and LLaMA-3.1). Using sentiment dynamics, allotaxonometry, and embedding regression, we find consistent ingroup-positive and outgroup-negative associations across foundational LLMs. We find that adopting a persona systematically alters models' evaluative and affiliative language patterns. For the exemplar personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity. Persona conditioning produces distinct clustering in embedding space and measurable semantic divergence, supporting the view that even abstract identity cues can shift models' linguistic behavior. Furthermore, outgroup-targeted prompts increased hostility bias by 1.19--21.76\% across models. These findings suggest that LLMs learn not only factual associations about social groups but also internalize and reproduce distinct ways of being, including attitudes, worldviews, and cognitive styles that are activated when enacting personas. We interpret these results as evidence of a multi-scale coupling between local context (e.g., the persona prompt), localizable representations (what the model ``knows''), and global cognitive tendencies (how it ``thinks''), which are at least reflected in the training data. Finally, we demonstrate ION, an ``us versus them'' bias mitigation approach using fine-tuning and direct preference optimization (DPO), which reduces sentiment divergence by up to 69\%, highlighting the potential for targeted mitigation strategies in future LLM development.

Related papers

Interpretable Debiasing of Vision-Language Models for Social Fairness [55.85977929985967]
We introduce an interpretable, model-agnostic bias mitigation framework, DeBiasLens, that localizes social attribute neurons in Vision-Language models.<n>We train SAEs on facial image or caption datasets without corresponding social attribute labels to uncover neurons highly responsive to specific demographics.<n>Our research lays the groundwork for future auditing tools, prioritizing social fairness in emerging real-world AI systems.
arXiv Detail & Related papers (2026-02-27T13:37:11Z)
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions [13.929713456538932]
We propose a novel methodology for constructing virtual personas with synthetic user "backstories" generated as extended, multi-turn interview transcripts.<n>We show that virtual personas conditioned on our backstories closely replicate human response distributions and produce effect sizes that closely match those observed in the original studies of in-group/out-group biases.
arXiv Detail & Related papers (2025-04-16T00:10:34Z)
Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models [66.5536396328527]
LLMs inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups.<n>We propose Fairness Mediator (FairMed), a bias mitigation framework that neutralizes stereotype associations.<n>Our framework comprises two main components: a stereotype association prober and an adversarial debiasing neutralizer.
arXiv Detail & Related papers (2025-04-10T14:23:06Z)
Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs [65.93003087656754]
VisFactor is a benchmark that digitizes 20 vision-centric subtests from a well-established cognitive psychology assessment.<n>We evaluate 20 frontier Multimodal Large Language Models (MLLMs) from GPT, Gemini, Claude, LLaMA, Qwen, and SEED families.<n>The best-performing model achieves a score of only 25.19 out of 100, with consistent failures on tasks such as mental rotation, spatial relation inference, and figure-ground discrimination.
arXiv Detail & Related papers (2025-02-23T04:21:32Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption [10.35915254696156]
We show that outgroup bias manifests as strongly as ingroup favoritism. Our findings highlight the potential to develop more equitable and balanced language models.
arXiv Detail & Related papers (2024-09-05T18:08:47Z)
Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Generative Language Models Exhibit Social Identity Biases [17.307292780517653]
We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences. Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans.
arXiv Detail & Related papers (2023-10-24T13:17:40Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.