On the Inevitability of Left-Leaning Political Bias in Aligned Language Models
- URL: http://arxiv.org/abs/2507.15328v1
- Date: Mon, 21 Jul 2025 07:37:28 GMT
- Title: On the Inevitability of Left-Leaning Political Bias in Aligned Language Models
- Authors: Thilo Hagendorff,
- Abstract summary: There are concerns that large language models (LLMs) exhibit a left-wing political bias.<n>I argue that intelligent systems that are trained to be harmless and honest must necessarily exhibit left-wing political bias.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The guiding principle of AI alignment is to train large language models (LLMs) to be harmless, helpful, and honest (HHH). At the same time, there are mounting concerns that LLMs exhibit a left-wing political bias. Yet, the commitment to AI alignment cannot be harmonized with the latter critique. In this article, I argue that intelligent systems that are trained to be harmless and honest must necessarily exhibit left-wing political bias. Normative assumptions underlying alignment objectives inherently concur with progressive moral frameworks and left-wing principles, emphasizing harm avoidance, inclusivity, fairness, and empirical truthfulness. Conversely, right-wing ideologies often conflict with alignment guidelines. Yet, research on political bias in LLMs is consistently framing its insights about left-leaning tendencies as a risk, as problematic, or concerning. This way, researchers are actively arguing against AI alignment, tacitly fostering the violation of HHH principles.
Related papers
- "Amazing, They All Lean Left" -- Analyzing the Political Temperaments of Current LLMs [5.754220850145368]
We find strong and consistent prioritization of liberal-leaning values, particularly care and fairness, across most models.<n>We argue that this "liberal tilt" is not a programming error but an emergent property of training on democratic rights-focused discourse.<n>Rather than undermining democratic discourse, this pattern may offer a new lens through which to examine collective reasoning.
arXiv Detail & Related papers (2025-07-08T21:19:25Z) - Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models [72.89977583150748]
We propose a novel methodology to assess how Large Language Models align with broader geopolitical value systems.<n>We find that LLMs generally favor democratic values and leaders, but exhibit increases favorability toward authoritarian figures when prompted in Mandarin.
arXiv Detail & Related papers (2025-06-15T07:52:07Z) - Normative Conflicts and Shallow AI Alignment [0.0]
The progress of AI systems such as large language models (LLMs) raises increasingly pressing concerns about their safe deployment.<n>I argue that this vulnerability reflects a fundamental limitation of existing alignment methods.<n>I show how humans' ability to engage in deliberative reasoning enhances their resilience against similar adversarial tactics.
arXiv Detail & Related papers (2025-06-05T06:57:28Z) - Are Language Models Consequentialist or Deontological Moral Reasoners? [69.85385952436044]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z) - Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models [3.4280925987535786]
Large Language Models (LLMs) are increasingly shaping political discourse, yet their responses often display inconsistency when subjected to scrutiny.<n>Do these responses reflect genuine internal beliefs or merely surface-level alignment with training data?<n>We propose a novel framework for evaluating belief depth by analyzing argumentative consistency and (2) uncertainty quantification.
arXiv Detail & Related papers (2025-04-23T19:00:39Z) - Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs [0.0]
We introduce an Item Response Theory (IRT)-based framework to detect and quantify socioeconomic bias in large language models (LLMs)<n>IRT accounts for item difficulty, improving ideological bias estimation.<n>This empirically validated framework enhances AI alignment research and promotes fairer AI governance.
arXiv Detail & Related papers (2025-03-17T13:20:09Z) - Societal Alignment Frameworks Can Improve LLM Alignment [50.97852062232431]
We argue that improving LLM alignment requires incorporating insights from societal alignment frameworks.<n>We then investigate how uncertainty within societal alignment frameworks manifests in LLM alignment.<n>We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity.
arXiv Detail & Related papers (2025-02-27T13:26:07Z) - Political Neutrality in AI Is Impossible- But Here Is How to Approximate It [97.59456676216115]
We argue that true political neutrality is neither feasible nor universally desirable due to its subjective nature and the biases inherent in AI training data, algorithms, and user interactions.<n>We use the term "approximation" of political neutrality to shift the focus from unattainable absolutes to achievable, practical proxies.
arXiv Detail & Related papers (2025-02-18T16:48:04Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.