Related papers: Beyond the Surface: Probing the Ideological Depth of Large Language Models

Beyond the Surface: Probing the Ideological Depth of Large Language Models

URL: http://arxiv.org/abs/2508.21448v1
Date: Fri, 29 Aug 2025 09:27:01 GMT
Title: Beyond the Surface: Probing the Ideological Depth of Large Language Models
Authors: Shariar Kabir, Kevin Esterling, Yue Dong,
Abstract summary: This paper investigates the concept of "ideological depth" in Large Language Models (LLMs)<n>We measure the "steerability" of two well-known open-source LLMs using instruction prompting and activation steering.<n>Preliminary analysis reveals that models with lower steerability possess more distinct and abstract ideological features.
Score: 3.84754844062131
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) have demonstrated pronounced ideological leanings, yet the stability and depth of these positions remain poorly understood. Surface-level responses can often be manipulated through simple prompt engineering, calling into question whether they reflect a coherent underlying ideology. This paper investigates the concept of "ideological depth" in LLMs, defined as the robustness and complexity of their internal political representations. We employ a dual approach: first, we measure the "steerability" of two well-known open-source LLMs using instruction prompting and activation steering. We find that while some models can easily switch between liberal and conservative viewpoints, others exhibit resistance or an increased rate of refusal, suggesting a more entrenched ideological structure. Second, we probe the internal mechanisms of these models using Sparse Autoencoders (SAEs). Preliminary analysis reveals that models with lower steerability possess more distinct and abstract ideological features. Our evaluations reveal that one model can contain 7.3x more political features than another model of similar size. This allows targeted ablation of a core political feature in an ideologically "deep" model, leading to consistent, logical shifts in its reasoning across related topics, whereas the same intervention in a "shallow" model results in an increase in refusal outputs. Our findings suggest that ideological depth is a quantifiable property of LLMs and that steerability serves as a valuable window into their latent political architecture.

Related papers

Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants [85.33837131101342]
We propose a strategic roadmap organized into four pillars: foundational infrastructure, algorithmic optimization, cognitive reasoning, and unified multimodal intelligence.<n>We argue that this transition is essential for developing next-generation AI capable of complex structural reasoning, dynamic self-correction, and seamless multimodal integration.
arXiv Detail & Related papers (2026-01-20T14:58:23Z)
Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality [52.57416398859353]
We show that causal minimality can endow latent representations of diffusion vision and autoregressive language models with clear causal interpretation and robust, component-wise identifiable control.<n>We introduce a novel theoretical framework for hierarchical selection models, where higher-level concepts emerge from the constrained composition of lower-level variables.<n>These causally grounded concepts serve as levers for fine-grained model steering, paving the way for transparent, reliable systems.
arXiv Detail & Related papers (2025-12-11T14:59:14Z)
LTD-Bench: Evaluating Large Language Models by Letting Them Draw [57.237152905238084]
LTD-Bench is a breakthrough benchmark for large language models (LLMs)<n>It transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code.<n> LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
arXiv Detail & Related papers (2025-11-04T08:11:23Z)
Don't Change My View: Ideological Bias Auditing in Large Language Models [0.0]
We adapt a previously proposed statistical method to the new context of ideological bias auditing.<n>We analyze distributional shifts in model outputs across prompts that are thematically related to a chosen topic.<n>This design makes the method particularly suitable for auditing proprietary black-box systems.
arXiv Detail & Related papers (2025-09-16T04:14:29Z)
POW: Political Overton Windows of Large Language Models [15.998401166180388]
Political bias in Large Language Models (LLMs) presents a growing concern for the responsible deployment of AI systems.<n>Traditional audits attempt to locate a model's political position as a point estimate, masking the broader set of ideological boundaries that shape what a model is willing or unwilling to say.<n>In this paper, we draw upon the concept of the Overton Window as a framework for mapping these boundaries.
arXiv Detail & Related papers (2025-09-08T17:57:54Z)
SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z)
Political Ideology Shifts in Large Language Models [6.062377561249039]
We investigate how adopting synthetic personas influences ideological expression in large language models (LLMs)<n>Our analysis reveals four consistent patterns: (i) larger models display broader and more implicit ideological coverage; (ii) susceptibility to explicit ideological cues grows with scale; (iii) models respond more strongly to right-authoritarian than to left-libertarian priming; and (iv) thematic content in persona descriptions induces ideological shifts, which amplify with size.
arXiv Detail & Related papers (2025-08-22T00:16:38Z)
Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models [72.89977583150748]
We propose a novel methodology to assess how Large Language Models align with broader geopolitical value systems.<n>We find that LLMs generally favor democratic values and leaders, but exhibit increases favorability toward authoritarian figures when prompted in Mandarin.
arXiv Detail & Related papers (2025-06-15T07:52:07Z)
Probing the Subtle Ideological Manipulation of Large Language Models [0.3745329282477067]
Large Language Models (LLMs) have transformed natural language processing, but concerns have emerged about their susceptibility to ideological manipulation.<n>We introduce a novel multi-task dataset designed to reflect diverse ideological positions through tasks such as ideological QA, statement ranking, manifesto cloze completion, and Congress bill comprehension.<n>Our findings indicate that fine-tuning significantly enhances nuanced ideological alignment, while explicit prompts provide only minor refinements.
arXiv Detail & Related papers (2025-04-19T13:11:50Z)
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence [57.57786477441956]
Prior work suggests that a single refusal direction in the model's activation space determines whether an LLM refuses a request.<n>We propose a novel gradient-based approach to representation engineering and use it to identify refusal directions.<n>We show that refusal mechanisms in LLMs are governed by complex spatial structures and identify functionally independent directions.
arXiv Detail & Related papers (2025-02-24T18:52:59Z)
Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas [5.237116285113809]
We map the political distribution of persona-based prompted large language models using the Political Compass Test (PCT)<n>Our experiments reveal that synthetic personas predominantly cluster in the left-libertarian quadrant, with models demonstrating varying degrees of responsiveness when prompted with explicit ideological descriptors.<n>While all models demonstrate significant shifts towards right-authoritarian positions, they exhibit more limited shifts towards left-libertarian positions, suggesting an asymmetric response to ideological manipulation that may reflect inherent biases in model training.
arXiv Detail & Related papers (2024-12-19T13:36:18Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
Does Deep Learning Learn to Abstract? A Systematic Probing Framework [69.2366890742283]
Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. We introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective.
arXiv Detail & Related papers (2023-02-23T12:50:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.