GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy
- URL: http://arxiv.org/abs/2407.18008v1
- Date: Thu, 25 Jul 2024 13:04:25 GMT
- Title: GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy
- Authors: Jan Batzner, Volker Stocker, Stefan Schmid, Gjergji Kasneci,
- Abstract summary: We evaluate and compare the alignment of six LLMs by OpenAI, Anthropic, and Cohere with German party positions.
We conduct our prompt experiment for which we use the benchmark and sociodemographic data of leading German parliamentarians.
- Score: 20.06753067241866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLMs are changing the way humans create and interact with content, potentially affecting citizens' political opinions and voting decisions. As LLMs increasingly shape our digital information ecosystems, auditing to evaluate biases, sycophancy, or steerability has emerged as an active field of research. In this paper, we evaluate and compare the alignment of six LLMs by OpenAI, Anthropic, and Cohere with German party positions and evaluate sycophancy based on a prompt experiment. We contribute to evaluating political bias and sycophancy in multi-party systems across major commercial LLMs. First, we develop the benchmark dataset GermanPartiesQA based on the Voting Advice Application Wahl-o-Mat covering 10 state and 1 national elections between 2021 and 2023. In our study, we find a left-green tendency across all examined LLMs. We then conduct our prompt experiment for which we use the benchmark and sociodemographic data of leading German parliamentarians to evaluate changes in LLMs responses. To differentiate between sycophancy and steerabilty, we use 'I am [politician X], ...' and 'You are [politician X], ...' prompts. Against our expectations, we do not observe notable differences between prompting 'I am' and 'You are'. While our findings underscore that LLM responses can be ideologically steered with political personas, they suggest that observed changes in LLM outputs could be better described as personalization to the given context rather than sycophancy.
Related papers
- Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion [45.84205238554709]
We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents.
We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates.
We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties.
arXiv Detail & Related papers (2024-07-11T14:52:18Z) - Large Language Models' Detection of Political Orientation in Newspapers [0.0]
Various methods have been developed to better understand newspapers' positioning.
The advent of Large Language Models (LLM) hold disruptive potential to assist researchers and citizens alike.
We compare how four widely employed LLMs rate the positioning of newspapers, and compare if their answers align with one another.
Over a woldwide dataset, articles in newspapers are positioned strikingly differently by single LLMs, hinting to inconsistent training or excessive randomness in the algorithms.
arXiv Detail & Related papers (2024-05-23T06:18:03Z) - Assessing Political Bias in Large Language Models [0.624709220163167]
We evaluate the political bias of open-source Large Language Models (LLMs) concerning political issues within the European Union (EU) from a German voter's perspective.
We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral.
arXiv Detail & Related papers (2024-05-17T15:30:18Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - The Political Preferences of LLMs [0.0]
I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs.
Most conversational LLMs generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints.
I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning.
arXiv Detail & Related papers (2024-02-02T02:43:10Z) - LLM Voting: Human Choices and AI Collective Decision Making [0.0]
This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns.
We observed that the methods used for voting input and the presentation of choices influence LLM voting behavior.
We discovered that varying the persona can reduce some of these biases and enhance alignment with human choices.
arXiv Detail & Related papers (2024-01-31T14:52:02Z) - Whose Opinions Do Language Models Reflect? [88.35520051971538]
We investigate the opinions reflected by language models (LMs) by leveraging high-quality public opinion polls and their associated human responses.
We find substantial misalignment between the views reflected by current LMs and those of US demographic groups.
Our analysis confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs.
arXiv Detail & Related papers (2023-03-30T17:17:08Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.