Related papers: Fine-Grained Interpretation of Political Opinions in Large Language Models

Fine-Grained Interpretation of Political Opinions in Large Language Models

URL: http://arxiv.org/abs/2506.04774v1
Date: Thu, 05 Jun 2025 09:06:59 GMT
Title: Fine-Grained Interpretation of Political Opinions in Large Language Models
Authors: Jingyu Hu, Mengyue Yang, Mengnan Du, Weiru Liu,
Abstract summary: Recent work indicates that there is a misalignment between LLMs' responses and their internal intentions.<n>This motivates us to probe LLMs' internal mechanisms and help uncover their internal political states.<n>We designed a four-dimensional political learning framework and constructed a corresponding dataset for fine-grained political concept vector learning.
Score: 19.21833592916603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Studies of LLMs' political opinions mainly rely on evaluations of their open-ended responses. Recent work indicates that there is a misalignment between LLMs' responses and their internal intentions. This motivates us to probe LLMs' internal mechanisms and help uncover their internal political states. Additionally, we found that the analysis of LLMs' political opinions often relies on single-axis concepts, which can lead to concept confounds. In this work, we extend the single-axis to multi-dimensions and apply interpretable representation engineering techniques for more transparent LLM political concept learning. Specifically, we designed a four-dimensional political learning framework and constructed a corresponding dataset for fine-grained political concept vector learning. These vectors can be used to detect and intervene in LLM internals. Experiments are conducted on eight open-source LLMs with three representation engineering techniques. Results show these vectors can disentangle political concept confounds. Detection tasks validate the semantic meaning of the vectors and show good generalization and robustness in OOD settings. Intervention Experiments show these vectors can intervene in LLMs to generate responses with different political leanings.

Related papers

Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z)
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory [4.48417484433108]
Large Language Models (LLMs) have become increasingly incorporated into everyday life for many internet users.<n>The importance of these roles raise questions about how and what responses LLMs make in difficult political and moral domains.<n>Previous research has used the Moral Foundations Theory (MFT) to measure differences in human participants along political, national, and cultural lines.
arXiv Detail & Related papers (2025-10-14T19:36:36Z)
Steering Towards Fairness: Mitigating Political Bias in LLMs [16.594400974742523]
We employ a framework for probing and mitigating such biases in large language models (LLMs) through analysis of internal model representations.<n>We introduce a comprehensive activation extraction pipeline capable of layer-wise analysis across multiple ideological axes.<n>Our results show that decoder LLMs systematically encode representational bias across layers, which can be leveraged for effective steering vector-based mitigation.
arXiv Detail & Related papers (2025-08-12T11:09:03Z)
Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks [52.098988739649705]
This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater.<n>We develop a no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios.<n>Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters.
arXiv Detail & Related papers (2025-05-28T01:31:54Z)
Linear Representations of Political Perspective Emerge in Large Language Models [2.2462222233189286]
Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives.<n>This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics.
arXiv Detail & Related papers (2025-03-03T21:59:01Z)
Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes [20.407518082067437]
This study examines the alignment of large language models (LLMs) with human values for mitigate the domain of politics.<n>We analyze the factors that contribute to LLMs' deviations from empirical positions on political issues.<n>We find that while LLMs can mimic certain political parties' positions, they often exaggerate these positions more than human survey respondents do.
arXiv Detail & Related papers (2025-01-24T07:24:23Z)
Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks.<n>Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics. Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues. The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
The Political Preferences of LLMs [0.0]
I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs. Most conversational LLMs generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning.
arXiv Detail & Related papers (2024-02-02T02:43:10Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
Measurement in the Age of LLMs: An Application to Ideological Scaling [1.9413548770753526]
This paper explores the use of large language models (LLMs) to navigate the conceptual clutter inherent to social scientific measurement tasks. We rely on LLMs' remarkable linguistic fluency to elicit ideological scales of both legislators and text.
arXiv Detail & Related papers (2023-12-14T18:34:06Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.