Related papers: When Do Language Models Endorse Limitations on Human Rights Principles?

When Do Language Models Endorse Limitations on Human Rights Principles?

URL: http://arxiv.org/abs/2603.04217v1
Date: Wed, 04 Mar 2026 16:01:53 GMT
Title: When Do Language Models Endorse Limitations on Human Rights Principles?
Authors: Keenan Samway, Nicole Miu Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin,
Abstract summary: We evaluate how Large Language Models (LLMs) navigate trade-offs involving the Universal Declaration of Human Rights (UDHR)<n>Our analysis of eleven major LLMs reveals systematic biases where models accept limiting Economic, Social, and Cultural rights more often than Political and Civil rights.
Score: 82.84306700922664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Large Language Models (LLMs) increasingly mediate global information access with the potential to shape public discourse, their alignment with universal human rights principles becomes important to ensure that these rights are abided by in high stakes AI-mediated interactions. In this paper, we evaluate how LLMs navigate trade-offs involving the Universal Declaration of Human Rights (UDHR), leveraging 1,152 synthetically generated scenarios across 24 rights articles and eight languages. Our analysis of eleven major LLMs reveals systematic biases where models: (1) accept limiting Economic, Social, and Cultural rights more often than Political and Civil rights, (2) demonstrate significant cross-linguistic variation with elevated endorsement rates of rights-limiting actions in Chinese and Hindi compared to English or Romanian, (3) show substantial susceptibility to prompt-based steering, and (4) exhibit noticeable differences between Likert and open-ended responses, highlighting critical challenges in LLM preference assessment.

Related papers

Assessing Human Rights Risks in AI: A Framework for Model Evaluation [0.10195618602298682]
We contribute to the field of algorithmic auditing by presenting a framework to computationally assess human rights risk.<n>We develop an approach to evaluating a model to make grounded claims about the level of risk a model poses to particular human rights.<n>Because a human rights approach centers on real-world harms, it requires evaluating AI systems in the specific contexts in which they are deployed.
arXiv Detail & Related papers (2025-10-07T02:12:56Z)
Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective [52.452449102961225]
This study proposes a novel cross-linguistic perspective to investigate reasoning generalization.<n>Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm.<n>Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.
arXiv Detail & Related papers (2025-10-02T17:49:49Z)
On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts [69.69818198773244]
We study a range of LMs on both language comprehension and language production.<n>We find that state-of-the-art LMs, but not smaller ones, achieve strong performance on language comprehension.
arXiv Detail & Related papers (2025-09-08T17:59:32Z)
Comparing human and LLM politeness strategies in free production [6.91274201589206]
Polite speech poses a fundamental alignment challenge for large language models (LLMs)<n>We investigate whether LLMs employ a similarly context-sensitive repertoire by comparing human and LLM responses in both constrained and open-ended production tasks.<n>We find that larger models successfully replicate key preferences from the computational pragmatics literature, and human evaluators surprisingly prefer LLM-generated responses in open-ended contexts.
arXiv Detail & Related papers (2025-06-11T04:44:46Z)
Do LLMs exhibit demographic parity in responses to queries about Human Rights? [4.186018120368565]
Hedging and non-affirmation are behaviours that express ambiguity or a lack of clear endorsement on specific statements.<n>We design a novel prompt set on human rights in the context of different national or social identities.<n>We develop metrics to capture hedging and non-affirmation behaviours.<n>We find that all models exhibit some demographic disparities in how they attribute human rights between different identity groups.
arXiv Detail & Related papers (2025-02-26T15:19:35Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.<n>Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.<n>Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
The Impossibility of Fair LLMs [17.812295963158714]
We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair language model intractable.<n>We show that each framework either does not extend to the general-purpose AI context or is infeasible in practice.<n>These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome.
arXiv Detail & Related papers (2024-05-28T04:36:15Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.