Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models
- URL: http://arxiv.org/abs/2504.17052v1
- Date: Wed, 23 Apr 2025 19:00:39 GMT
- Title: Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models
- Authors: Shariar Kabir, Kevin Esterling, Yue Dong,
- Abstract summary: Large Language Models (LLMs) are increasingly shaping political discourse, yet their responses often display inconsistency when subjected to scrutiny.<n>Do these responses reflect genuine internal beliefs or merely surface-level alignment with training data?<n>We propose a novel framework for evaluating belief depth by analyzing argumentative consistency and (2) uncertainty quantification.
- Score: 3.4280925987535786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly shaping political discourse, yet their responses often display inconsistency when subjected to scrutiny. While prior research has primarily categorized LLM outputs as left- or right-leaning to assess their political stances, a critical question remains: Do these responses reflect genuine internal beliefs or merely surface-level alignment with training data? To address this, we propose a novel framework for evaluating belief depth by analyzing (1) argumentative consistency and (2) uncertainty quantification. We evaluate 12 LLMs on 19 economic policies from the Political Compass Test, challenging their belief stability with both supportive and opposing arguments. Our analysis reveals that LLMs exhibit topic-specific belief stability rather than a uniform ideological stance. Notably, up to 95% of left-leaning models' responses and 89% of right-leaning models' responses remain consistent under the challenge, enabling semantic entropy to achieve high accuracy (AUROC=0.78), effectively distinguishing between surface-level alignment from genuine belief. These findings call into question the assumption that LLMs maintain stable, human-like political ideologies, emphasizing the importance of conducting topic-specific reliability assessments for real-world applications.
Related papers
- Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases [24.622980403581018]
We empirically examine how the values and biases embedded in training corpora shape model outputs.
As a case study, we focus on probing the political leanings of LLMs in 32 U.S. Supreme Court cases.
arXiv Detail & Related papers (2025-02-25T15:16:17Z) - Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives [52.863024096759816]
Misaligned research objectives have hindered progress in adversarial robustness research over the past decade.
We argue that realigned objectives are necessary for meaningful progress in adversarial alignment.
arXiv Detail & Related papers (2025-02-17T15:28:40Z) - Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes [20.407518082067437]
This study examines the alignment of large language models (LLMs) with human values for mitigate the domain of politics.
We analyze the factors that contribute to LLMs' deviations from empirical positions on political issues.
We find that while LLMs can mimic certain political parties' positions, they often exaggerate these positions more than human survey respondents do.
arXiv Detail & Related papers (2025-01-24T07:24:23Z) - Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative.<n>Existing evaluations focus narrowly on safety risks such as bias and toxicity.<n>Existing benchmarks are prone to data contamination.<n>The pluralistic nature of human values across individuals and cultures is largely ignored in measuring LLMs value alignment.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.<n>They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.<n>We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z) - Belief in the Machine: Investigating Epistemological Blind Spots of Language Models [51.63547465454027]
Language models (LMs) are essential for reliable decision-making in fields like healthcare, law, and journalism.
This study systematically evaluates the capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE.
Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios.
Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data.
arXiv Detail & Related papers (2024-10-28T16:38:20Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs [13.036825846417006]
We propose a series of tests to assess the reliability and consistency of large language models' stances on political statements.
We study models ranging in size from 7B to 70B parameters and find that their reliability increases with parameter count.
Larger models show overall stronger alignment with left-leaning parties but differ among policy programs.
arXiv Detail & Related papers (2024-02-27T16:19:37Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Inducing Political Bias Allows Language Models Anticipate Partisan
Reactions to Controversies [5.958974943807783]
This study addresses the challenge of understanding political bias in digitized discourse using Large Language Models (LLMs)
We present a comprehensive analytical framework, consisting of Partisan Bias Divergence Assessment and Partisan Class Tendency Prediction.
Our findings reveal the model's effectiveness in capturing emotional and moral nuances, albeit with some challenges in stance detection.
arXiv Detail & Related papers (2023-11-16T08:57:53Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.